Bayesian interpretation of a distance function for navigating high-dimensional descriptor spaces

Martin Vogt; Jeffrey W Godden; Jürgen Bajorath

doi:10.1021/ci600280b

Bayesian interpretation of a distance function for navigating high-dimensional descriptor spaces

J Chem Inf Model. 2007 Jan-Feb;47(1):39-46. doi: 10.1021/ci600280b.

Authors

Martin Vogt¹, Jeffrey W Godden, Jürgen Bajorath

Affiliation

¹ Department of Life Science Informatics, B-IT, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstrasse 2, D-53113 Bonn, Germany.

PMID: 17238247
DOI: 10.1021/ci600280b

Abstract

A distance function to analyze molecular similarity relationships in high-dimensional descriptor spaces and focus search calculations on "active subspaces" is defined in Bayesian terms. As a measure of similarity, database compounds are ranked according to their distance from the center of a subspace formed by known active molecules. From a Bayesian point of view, distance calculations are transformed into a "log-odds" estimate. Following this approach, maximizing the likelihood of a compound to be active corresponds to minimizing the distance from the center of an active subspace. Since the methodology generates a ranking of database molecules according to decreasing similarity to template compounds, it can be conveniently compared to similarity search tools, and the Bayesian function is found to compare favorably to two standard fingerprints in multiple template-based database searching.

MeSH terms

Bayes Theorem*
Databases, Factual*
Molecular Structure*
Structure-Activity Relationship