Design and characterization of chemical space networks for different compound data sets

J Comput Aided Mol Des. 2015 Feb;29(2):113-25. doi: 10.1007/s10822-014-9821-4. Epub 2014 Dec 3.

Abstract

Chemical Space Networks (CSNs) are generated for different compound data sets on the basis of pairwise similarity relationships. Such networks are thought to complement and further extend traditional coordinate-based views of chemical space. Our proof-of-concept study focuses on CSNs based upon fingerprint similarity relationships calculated using the conventional Tanimoto similarity metric. The resulting CSNs are characterized with statistical measures from network science and compared in different ways. We show that the homophily principle, which is widely considered in the context of social networks, is a major determinant of the topology of CSNs of bioactive compounds, designed as threshold networks, typically giving rise to community structures. Many properties of CSNs are influenced by numerical features of the conventional Tanimoto similarity metric and largely dominated by the edge density of the networks, which depends on chosen similarity threshold values. However, properties of different CSNs with constant edge density can be directly compared, revealing systematic differences between CSNs generated from randomly collected or bioactive compounds.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Datasets as Topic*
  • Models, Chemical*
  • Models, Theoretical*
  • Statistics as Topic