Target-Driven Subspace Mapping Methods and Their Applicability Domain Estimation

Mol Inform. 2011 Sep;30(9):779-89. doi: 10.1002/minf.201100053. Epub 2011 Aug 3.

Abstract

This work describes a methodology for assisting virtual screening of drugs during the early stages of the drug development process. This methodology is proposed to improve the reliability of in silico property prediction and it is structured in two steps. Firstly, a transformation is sought for mapping a high-dimensional space defined by potentially redundant or irrelevant molecular descriptors into a low-dimensional application-related space. For this task we evaluate three different target-driven subspace mapping methods, out of which we highlight the recent Correlative Matrix Mapping (CMM) as the most stable. Secondly, we apply an applicability domain model on the low-dimensional space for assessing confidentiality of compound classification. By a probabilistic framework the applicability domain approach identifies poorly represented compounds in the training set (extrapolation problems) and regions in the space where the uncertainty about the correct class is higher than normal (interpolation problems). This two-step approach represents an important contribution to the development of confident prediction tools in the chemoinformatics area, where the field is in need of both interpretable models and methods that estimate the confidence of predictions.

Keywords: Applicability domain; Bayesian estimation; Chemoinformatics; QSAR; Subspace mapping.