Probabilistic Modeling of Conformational Space for 3D Machine Learning Approaches

Mol Inform. 2010 May 17;29(5):441-55. doi: 10.1002/minf.201000036. Epub 2010 May 17.

Abstract

We present a new probabilistic encoding of the conformational space of a molecule that allows for the integration into common similarity calculations. The method uses distance profiles of flexible atom-pairs and computes generative models that describe the distance distribution in the conformational space. The generative models permit the use of probabilistic kernel functions and, therefore, our approach can be used to extend existing 3D molecular kernel functions, as applied in support vector machines, to build QSAR models. The resulting kernels are valid 4D kernel functions and reduce the dependency of the model quality on suitable conformations of the molecules. We showed in several experiments the robust performance of the 4D kernel function, which was extended by our approach, in comparison to the original 3D-based kernel function. The new method compares the conformational space of two molecules within one kernel evaluation. Hence, the number of kernel evaluations is significantly reduced in comparison to common kernel-based conformational space averaging techniques. Additionally, the performance gain of the extended model correlates with the flexibility of the data set and enables an a priori estimation of the model improvement.

Keywords: 4D-QSAR; Flexible atom-pair kernel; Gaussian mixture model; Machine learning; Molecular similarity; Probabilistic conformational space encoding.