True Accuracy of Fast Scoring Functions to Predict High-Throughput Screening Data from Docking Poses: The Simpler the Better

J Chem Inf Model. 2021 Jun 28;61(6):2788-2797. doi: 10.1021/acs.jcim.1c00292. Epub 2021 Jun 10.

Abstract

Hundreds of fast scoring functions have been developed over the last 20 years to predict binding free energies from three-dimensional structures of protein-ligand complexes. Despite numerous statistical promises, we believe that none of them has been properly validated for daily prospective high-throughput virtual screening studies, mostly because in silico screening challenges usually employ artificially built and biased datasets. We here carry out a fully unbiased evaluation of four scoring functions (Pafnucy, ΔvinaRF20, IFP, and GRIM) on an in-house developed data collection of experimental high-confidence screening data (LIT-PCBA) covering about 3 million data points on 15 diverse pharmaceutical targets. All four scoring functions were applied to rescore the docking poses of LIT-PCBA compounds in conditions mimicking exactly standard drug discovery scenarios and were compared in terms of propensity to enrich true binders in the top 1%-ranked hit lists. Interestingly, rescoring based on simple interaction fingerprints or interaction graphs outperforms state-of-the-art machine learning and deep learning scoring functions in most of the cases. The current study notably highlights the strong tendency of deep learning methods to predict affinity values within a very narrow range centered on the mean value of samples used for training. Moreover, it suggests that knowledge of pre-existing binding modes is the key to detecting the most potent binders.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Binding Sites
  • High-Throughput Screening Assays*
  • Ligands
  • Molecular Docking Simulation
  • Prospective Studies
  • Protein Binding
  • Proteins* / metabolism

Substances

  • Ligands
  • Proteins