Structure-based classification of active and inactive estrogenic compounds by decision tree, LVQ and kNN methods

Chemosphere. 2006 Jan;62(4):658-73. doi: 10.1016/j.chemosphere.2005.04.115. Epub 2005 Jun 29.

Abstract

The performance of decision tree (DT), learning vector quantization (LVQ), and k-nearest neighbour (kNN) methods classifying active and inactive estrogenic compounds in terms of their structure activity relationship (SAR) was evaluated. A set of 311 compounds was used for construction of the models, the predictive power of which was verified with separate training and test sets. Principal components derived from molecular descriptors calculated with DRAGON software were used as variables representing the structures of the compounds. Broadly, kNN had the best classification ability and DT the weakest, although the performance of each method was dependent on the group of compounds used for modelling. The best performance was obtained with kNN for the calf estrogen receptor data, averaging 98.3% of correctly classified compounds in the external tests. Overall, the results indicate that all the methods tested are suitable for the SAR classification of estrogenic compounds, producing models with a predictive power ranging from adequate to excellent.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Cattle
  • Decision Trees
  • Estrogens / chemistry
  • Estrogens / classification*
  • Estrogens / metabolism
  • Humans
  • Mice
  • Models, Molecular*
  • Neural Networks, Computer
  • Principal Component Analysis
  • Rats
  • Receptors, Estrogen / metabolism*
  • Structure-Activity Relationship*

Substances

  • Estrogens
  • Receptors, Estrogen