Clustering and rule-based classifications of chemical structures evaluated in the biological activity space

J Chem Inf Model. 2007 Mar-Apr;47(2):325-36. doi: 10.1021/ci6004004. Epub 2007 Feb 8.

Abstract

Classification methods for data sets of molecules according to their chemical structure were evaluated for their biological relevance, including rule-based, scaffold-oriented classification methods and clustering based on molecular descriptors. Three data sets resulting from uniformly determined in vitro biological profiling experiments were classified according to their chemical structures, and the results were compared in a Pareto analysis with the number of classes and their average spread in the profile space as two concurrent objectives which were to be minimized. It has been found that no classification method is overall superior to all other studied methods, but there is a general trend that rule-based, scaffold-oriented methods are the better choice if classes with homogeneous biological activity are required, but a large number of clusters can be tolerated. On the other hand, clustering based on chemical fingerprints is superior if fewer and larger classes are required, and some loss of homogeneity in biological activity can be accepted.

MeSH terms

  • Cluster Analysis
  • Computational Biology / statistics & numerical data*
  • Databases, Genetic
  • Models, Chemical*
  • Molecular Structure
  • Pharmaceutical Preparations / chemistry*
  • Pharmaceutical Preparations / classification*
  • Structure-Activity Relationship

Substances

  • Pharmaceutical Preparations