Visualization of molecular fingerprints

John R Owen; Ian T Nabney; José L Medina-Franco; Fabian López-Vallejo

doi:10.1021/ci1004042

Visualization of molecular fingerprints

J Chem Inf Model. 2011 Jul 25;51(7):1552-63. doi: 10.1021/ci1004042. Epub 2011 Jul 8.

Authors

John R Owen¹, Ian T Nabney, José L Medina-Franco, Fabian López-Vallejo

Affiliation

¹ Nonlinearity and Complexity Research Group, Aston University, Aston Triangle, Birmingham B4 7ET, United Kingdom.

PMID: 21696145
DOI: 10.1021/ci1004042

Abstract

A visualization plot of a data set of molecular data is a useful tool for gaining insight into a set of molecules. In chemoinformatics, most visualization plots are of molecular descriptors, and the statistical model most often used to produce a visualization is principal component analysis (PCA). This paper takes PCA, together with four other statistical models (NeuroScale, GTM, LTM, and LTM-LIN), and evaluates their ability to produce clustering in visualizations not of molecular descriptors but of molecular fingerprints. Two different tasks are addressed: understanding structural information (particularly combinatorial libraries) and relating structure to activity. The quality of the visualizations is compared both subjectively (by visual inspection) and objectively (with global distance comparisons and local k-nearest-neighbor predictors). On the data sets used to evaluate clustering by structure, LTM is found to perform significantly better than the other models. In particular, the clusters in LTM visualization space are consistent with the relationships between the core scaffolds that define the combinatorial sublibraries. On the data sets used to evaluate clustering by activity, LTM again gives the best performance but by a smaller margin. The results of this paper demonstrate the value of using both a nonlinear projection map and a Bernoulli noise model for modeling binary data.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Combinatorial Chemistry Techniques
Drug Discovery*
Models, Statistical*
Molecular Structure
Principal Component Analysis*
Small Molecule Libraries

Substances

Small Molecule Libraries

Grants and funding

BBS/S/N/2006/13090A/Biotechnology and Biological Sciences Research Council/United Kingdom