Assessing different classification methods for virtual screening

Dariusz Plewczynski; Stéphane A H Spieser; Uwe Koch

doi:10.1021/ci050519k

Assessing different classification methods for virtual screening

J Chem Inf Model. 2006 May-Jun;46(3):1098-106. doi: 10.1021/ci050519k.

Authors

Dariusz Plewczynski¹, Stéphane A H Spieser, Uwe Koch

Affiliation

¹ BioInfoBank Institute, Limanowskiego 24A/16, 60-744 Poznan, Poland. darman@bioinfo.pl

PMID: 16711730
DOI: 10.1021/ci050519k

Abstract

How well do different classification methods perform in selecting the ligands of a protein target out of large compound collections not used to train the model? Support vector machines, random forest, artificial neural networks, k-nearest-neighbor classification with genetic-algorithm-optimized feature selection, trend vectors, naïve Bayesian classification, and decision tree were used to divide databases into molecules predicted to be active and those predicted to be inactive. Training and predicted activities were treated as binary. The database was generated for the ligands of five different biological targets which have been the object of intense drug discovery efforts: HIV-reverse transcriptase, COX2, dihydrofolate reductase, estrogen receptor, and thrombin. We report significant differences in the performance of the methods independent of the biological target and compound class. Different methods can have different applications; some provide particularly high enrichment, others are strong in retrieving the maximum number of actives. We also show that these methods do surprisingly well in predicting recently published ligands of a target on the basis of initial leads and that a combination of the results of different methods in certain cases can improve results compared to the most consistent method.

Publication types

Research Support, Non-U.S. Gov't
Validation Study

MeSH terms

Algorithms
Cyclooxygenase 2 / drug effects
Drug Design*
HIV Reverse Transcriptase / drug effects
Ligands
Receptors, Estrogen / drug effects
Tetrahydrofolate Dehydrogenase / drug effects
Thrombin / drug effects

Substances

Ligands
Receptors, Estrogen
Cyclooxygenase 2
Tetrahydrofolate Dehydrogenase
HIV Reverse Transcriptase
Thrombin