Global free energy scoring functions based on distance-dependent atom-type pair descriptors

J Chem Inf Model. 2011 Mar 28;51(3):707-20. doi: 10.1021/ci100473d. Epub 2011 Feb 22.

Abstract

Scoring functions for protein-ligand docking have received much attention in the past two decades. In many cases, remarkable success has been demonstrated in predicting the correct geometry of interaction. On independent test sets, however, the predicted binding energies or scores correlate only slightly with the observed free energies of binding. In this study, we analyze how well free energies of binding can be predicted on the basis of crystal structures using traditional QSAR techniques in a proteochemometric approach. We introduce a new set of protein-ligand interaction descriptors on the basis of distance-binned Crippen-like atom type pairs. A subset of the publicly available PDBbind09-CN refined set (MW < 900 g/mol, #P < 2, ndon + nacc < 20; N = 1387) is being used as data set. It is demonstrated how simple, yet surprisingly good, scoring functions can be generated for the whole diverse database (R(2)(out-of-bag) = 0.48, R(p) = 0.69, RMSE = 1.44, MUE = 1.14) and individual protein family subsets. This performance is significantly better than the performance of almost all other scoring functions published that have been validated on a test set as large and diverse as the PDBbind refined set. We also find that on some protein families surprisingly good scoring functions can be obtained using simple ligand-only descriptors like logS, logP, and molecular weight. The ligand-descriptor based scoring function equals or even outperforms commonly used scoring functions, highlighting the need for better scoring functions. We demonstrate how the observed performance depends on the validation strategy, and we outline a general validation protocol for future free energy scoring functions.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Crystallography, X-Ray*
  • Humans
  • Molecular Structure