Bayesian interpretation of a distance function for navigating high-dimensional descriptor spaces

J Chem Inf Model. 2007 Jan-Feb;47(1):39-46. doi: 10.1021/ci600280b.

Abstract

A distance function to analyze molecular similarity relationships in high-dimensional descriptor spaces and focus search calculations on "active subspaces" is defined in Bayesian terms. As a measure of similarity, database compounds are ranked according to their distance from the center of a subspace formed by known active molecules. From a Bayesian point of view, distance calculations are transformed into a "log-odds" estimate. Following this approach, maximizing the likelihood of a compound to be active corresponds to minimizing the distance from the center of an active subspace. Since the methodology generates a ranking of database molecules according to decreasing similarity to template compounds, it can be conveniently compared to similarity search tools, and the Bayesian function is found to compare favorably to two standard fingerprints in multiple template-based database searching.

MeSH terms

  • Bayes Theorem*
  • Databases, Factual*
  • Molecular Structure*
  • Structure-Activity Relationship