Agreement of drug discovery data with Benford's law

Expert Opin Drug Discov. 2013 Jan;8(1):1-5. doi: 10.1517/17460441.2013.740007. Epub 2012 Nov 3.

Abstract

The ever-increasing rate of drug discovery data has complicated data analysis and potentially compromised data quality due to factors such as data handling errors. Parallel to this concern is the rise in blatant scientific misconduct. Combined, these problems highlight the importance of developing a method that can be used to systematically assess data quality. Benford's law has been used to discover data manipulation and data fabrication in various fields. In the authors' previous studies, it was demonstrated that the distribution of the corresponding activity and solubility data followed Benford's law distribution. It was also shown that too intense a selection of training data sets of regression model can disrupt Benford's law. Here, the authors present the application of Benford's law to a wider range of drug discovery data such as microarray and sequence data. They also suggest that Benford's law could also be applied to model building and reliability for structure-activity relationship study. Finally, the authors propose a protocol based on Benford's law which will provide researchers with an efficient method for data quality assessment. However, multifaceted quality control such as combinatorial use with data visualization may also be needed to further improve its reliability.

Publication types

  • Editorial

MeSH terms

  • Databases, Factual / standards*
  • Drug Discovery / methods
  • Drug Discovery / standards*
  • Drug Discovery / statistics & numerical data
  • Humans
  • Quality Control
  • Reproducibility of Results
  • Research Design / standards*
  • Scientific Misconduct
  • Statistics as Topic / standards