QSPR Prediction of Lipophilicity for Organic Compounds Using Random Forest Technique on the Basis of Simplex Representation of Molecular Structure

Mol Inform. 2012 Apr;31(3-4):273-80. doi: 10.1002/minf.201100102. Epub 2012 Mar 12.

Abstract

The relationship between the octanol-water partition coefficient for more than twelve thousand organic compounds and their structures was investigated using a QSPR approach based on Simplex Representation of Molecular Structure (SiRMS). The dataset used in our study included 10973 compounds with experimental values of lipophilicity (LogKow ) for different chemical compounds. Random Forest (RF) method was used for statistical modeling at the 2D level of representation of molecular structure. Developed models are adequate and successfully validated with external test sets. Proposed models have clear interpretation due to the use of simplex representation of molecular structure and predict the LogKow values with the accuracy of the best modern models. Thus QSPR models proposed in this study represent powerful and easy-to use virtual screening tool that can be recommended for prediction of octanol-water partition coefficient.

Keywords: 2D QSPR; Lipophilicity; Random Forest; Simplex representation.