Rethinking molecular similarity: comparing compounds on the basis of biological activity

ACS Chem Biol. 2012 Aug 17;7(8):1399-409. doi: 10.1021/cb3001028. Epub 2012 May 31.

Abstract

Since the advent of high-throughput screening (HTS), there has been an urgent need for methods that facilitate the interrogation of large-scale chemical biology data to build a mode of action (MoA) hypothesis. This can be done either prior to the HTS by subset design of compounds with known MoA or post HTS by data annotation and mining. To enable this process, we developed a tool that compares compounds solely on the basis of their bioactivity: the chemical biological descriptor "high-throughput screening fingerprint" (HTS-FP). In the current embodiment, data are aggregated from 195 biochemical and cell-based assays developed at Novartis and can be used to identify bioactivity relationships among the in-house collection comprising ~1.5 million compounds. We demonstrate the value of the HTS-FP for virtual screening and in particular scaffold hopping. HTS-FP outperforms state of the art methods in several aspects, retrieving bioactive compounds with remarkable chemical dissimilarity to a probe structure. We also apply HTS-FP for the design of screening subsets in HTS. Using retrospective data, we show that a biodiverse selection of plates performs significantly better than a chemically diverse selection of plates, both in terms of number of hits and diversity of chemotypes retrieved. This is also true in the case of hit expansion predictions using HTS-FP similarity. Sets of compounds clustered with HTS-FP are biologically meaningful, in the sense that these clusters enrich for genes and gene ontology (GO) terms, showing that compounds that are bioactively similar also tend to target proteins that operate together in the cell. HTS-FP are valuable not only because of their predictive power but mainly because they relate compounds solely on the basis of bioactivity, harnessing the accumulated knowledge of a high-throughput screening facility toward the understanding of how compounds interact with the proteome.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Biochemistry / methods
  • Chemistry, Pharmaceutical / methods*
  • Cluster Analysis
  • Computational Biology / methods
  • Drug Design
  • Drug Evaluation, Preclinical / methods
  • High-Throughput Screening Assays / methods*
  • Humans
  • Ligands
  • Models, Chemical
  • Models, Molecular
  • Molecular Conformation
  • Quantitative Structure-Activity Relationship

Substances

  • Ligands