
With the increasing size and cost of compound screening libraries and the importance of High Throughput Screening (HTS) for drug discovery, the prediction of biologically active compounds by computational methods is receiving more and more attention. According to the similar property principle, many computational methods in this area focus on measuring the structural similarities between chemical structures. Traditional similarity measures are either too rigid or consider only global similarities between structures. The Maximum Common Substructure (MCS) approach provides a promising alternative. In this talk, I will sketch a new backtracking algorithm for MCS taking into account special properties of chemical compound structures. This algorithm provides higher flexibility in the matching process and is effective in identifying local structural similarities. To predict and cluster biologically active compounds more efficiently, the concept of basis compounds is proposed that enables researchers to easily combine the MCS-based and traditional similarity measures with modern machine learning techniques. Support Vector Machines (or SVMs) are used to test how the MCS-based similarity measure and the basis compound vectorization method perform on two empirically tested datasets. The test results show that MCS complements the well-known atom pair descriptor- based similarity measure. By combining these two measures, our SVM-based model predicts the biological activities of chemical compounds with higher specificity and sensitivity than the existing methods in the literature, and it may find important applications in in silico compound screening and sequential screening processes.


International Society for Computational Biology grants affiliate status to the Ohio Bioinformatics Consortium
Ohio Regional Student Group
![]()
Click on the links below for the winners of the poster and paper awards at the Ohio Collaborative Conference on Bioinformatics 2009.
Paper awards.
Poster awards.
![]()