The experimental uncertainty of heterogeneous public K(i) data

J Med Chem. 2012 Jun 14;55(11):5165-73. doi: 10.1021/jm300131x. Epub 2012 May 29.

Abstract

The maximum achievable accuracy of in silico models depends on the quality of the experimental data. Consequently, experimental uncertainty defines a natural upper limit to the predictive performance possible. Models that yield errors smaller than the experimental uncertainty are necessarily overtrained. A reliable estimate of the experimental uncertainty is therefore of high importance to all originators and users of in silico models. The data deposited in ChEMBL was analyzed for reproducibility, i.e., the experimental uncertainty of independent measurements. Careful filtering of the data was required because ChEMBL contains unit-transcription errors, undifferentiated stereoisomers, and repeated citations of single measurements (90% of all pairs). The experimental uncertainty is estimated to yield a mean error of 0.44 pK(i) units, a standard deviation of 0.54 pK(i) units, and a median error of 0.34 pK(i) units. The maximum possible squared Pearson correlation coefficient (R(2)) on large data sets is estimated to be 0.81.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • Computer Simulation*
  • Databases, Factual / statistics & numerical data*
  • Drug Discovery / methods*
  • Ligands
  • Molecular Structure
  • Proteins / chemistry
  • Quantitative Structure-Activity Relationship
  • Reproducibility of Results
  • Stereoisomerism
  • Uncertainty*

Substances

  • Ligands
  • Proteins