True Accuracy of Fast Scoring Functions to Predict High-Throughput Screening Data from Docking Poses: The Simpler the Better

Viet-Khoa Tran-Nguyen; Guillaume Bret; Didier Rognan

doi:10.1021/acs.jcim.1c00292

True Accuracy of Fast Scoring Functions to Predict High-Throughput Screening Data from Docking Poses: The Simpler the Better

J Chem Inf Model. 2021 Jun 28;61(6):2788-2797. doi: 10.1021/acs.jcim.1c00292. Epub 2021 Jun 10.

Authors

Viet-Khoa Tran-Nguyen¹, Guillaume Bret¹, Didier Rognan¹

Affiliation

¹ Laboratoire d'Innovation Thérapeutique, UMR 7200 CNRS-Université de Strasbourg, 67400 Illkirch, France.

PMID: 34109796
DOI: 10.1021/acs.jcim.1c00292

Abstract

Hundreds of fast scoring functions have been developed over the last 20 years to predict binding free energies from three-dimensional structures of protein-ligand complexes. Despite numerous statistical promises, we believe that none of them has been properly validated for daily prospective high-throughput virtual screening studies, mostly because in silico screening challenges usually employ artificially built and biased datasets. We here carry out a fully unbiased evaluation of four scoring functions (Pafnucy, Δ_vinaRF₂₀, IFP, and GRIM) on an in-house developed data collection of experimental high-confidence screening data (LIT-PCBA) covering about 3 million data points on 15 diverse pharmaceutical targets. All four scoring functions were applied to rescore the docking poses of LIT-PCBA compounds in conditions mimicking exactly standard drug discovery scenarios and were compared in terms of propensity to enrich true binders in the top 1%-ranked hit lists. Interestingly, rescoring based on simple interaction fingerprints or interaction graphs outperforms state-of-the-art machine learning and deep learning scoring functions in most of the cases. The current study notably highlights the strong tendency of deep learning methods to predict affinity values within a very narrow range centered on the mean value of samples used for training. Moreover, it suggests that knowledge of pre-existing binding modes is the key to detecting the most potent binders.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Binding Sites
High-Throughput Screening Assays*
Ligands
Molecular Docking Simulation
Prospective Studies
Protein Binding
Proteins* / metabolism

Substances

Ligands
Proteins