Improving the usefulness of molecular similarity-based chemical prioritization strategies

SAR QSAR Environ Res. 2013 Aug;24(8):679-94. doi: 10.1080/1062936X.2013.792876. Epub 2013 May 28.

Abstract

Quantitative molecular similarity analysis (QMSA) is a seemingly useful tool for estimating environmental properties for the hundreds of emerging contaminants that have not yet been fully evaluated. Moreover, calibrated QMSA models are also useful for prioritizing research among currently unmeasured chemicals of interest. Previous work has demonstrated that prioritization based on molecular 'representativeness', as parameterized using summed Euclidean distances in n dimensions corresponding to n molecular descriptors, improves the prediction accuracy of QMSA models compared to random selection of compounds to be measured. In this study, we use two datasets of environmental parameters (i.e. in vitro oestrogenicity and sorption distribution coefficient Kd ) to demonstrate that maximizing representativeness alone cannot deliver optimal improvement in prediction accuracy if many of the chemicals that have already been measured are themselves highly representative. Thus, proper QMSA-based prioritization among unmeasured chemicals constitutes a balance between maximizing representativeness and minimizing redundancy. It is demonstrated that redundancy considerations are especially critical for highly heterogeneous datasets, and some discussion about achieving a proper balance between the two prioritization criteria is presented.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Environmental Pollutants / chemistry*
  • Environmental Pollutants / toxicity*
  • Models, Statistical
  • Organic Chemicals / chemistry*
  • Organic Chemicals / toxicity*
  • Quantitative Structure-Activity Relationship

Substances

  • Environmental Pollutants
  • Organic Chemicals