Machine Learning Consensus Scoring Improves Performance Across Targets in Structure-Based Virtual Screening

Spencer S Ericksen; Haozhen Wu; Huikun Zhang; Lauren A Michael; Michael A Newton; F Michael Hoffmann; Scott A Wildman

doi:10.1021/acs.jcim.7b00153

Machine Learning Consensus Scoring Improves Performance Across Targets in Structure-Based Virtual Screening

J Chem Inf Model. 2017 Jul 24;57(7):1579-1590. doi: 10.1021/acs.jcim.7b00153. Epub 2017 Jul 12.

Authors

Spencer S Ericksen, Haozhen Wu, Huikun Zhang, Lauren A Michael¹, Michael A Newton, F Michael Hoffmann, Scott A Wildman

Affiliation

¹ Center for High Throughput Computing, Department of Computer Sciences, University of Wisconsin-Madison , 1210 W. Dayton St., Madison, Wisconsin 53706, United States.

Abstract

In structure-based virtual screening, compound ranking through a consensus of scores from a variety of docking programs or scoring functions, rather than ranking by scores from a single program, provides better predictive performance and reduces target performance variability. Here we compare traditional consensus scoring methods with a novel, unsupervised gradient boosting approach. We also observed increased score variation among active ligands and developed a statistical mixture model consensus score based on combining score means and variances. To evaluate performance, we used the common performance metrics ROCAUC and EF1 on 21 benchmark targets from DUD-E. Traditional consensus methods, such as taking the mean of quantile normalized docking scores, outperformed individual docking methods and are more robust to target variation. The mixture model and gradient boosting provided further improvements over the traditional consensus methods. These methods are readily applicable to new targets in academic research and overcome the potentially poor performance of using a single docking method on a new target.

Publication types

Research Support, Non-U.S. Gov't
Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Benchmarking
Drug Evaluation, Preclinical / methods*
Machine Learning*
Molecular Docking Simulation
Molecular Targeted Therapy*
Proteins / metabolism*
User-Computer Interface

Substances

Proteins

Abstract

Publication types

MeSH terms

Substances

Grants and funding