Linear and nonlinear functions on modeling of aqueous solubility of organic compounds by two structure representation methods

Aixia Yan; Johann Gasteiger; Michael Krug; Soheila Anzali

doi:10.1023/b:jcam.0000030031.81235.05

Linear and nonlinear functions on modeling of aqueous solubility of organic compounds by two structure representation methods

J Comput Aided Mol Des. 2004 Feb;18(2):75-87. doi: 10.1023/b:jcam.0000030031.81235.05.

Authors

Aixia Yan¹, Johann Gasteiger, Michael Krug, Soheila Anzali

Affiliation

¹ Computer-Chemie-Centrum and Institut für Organische Chemie, Universität Erlangen-Nürnberg, Germany.

PMID: 15287695
DOI: 10.1023/b:jcam.0000030031.81235.05

Abstract

Several quantitative models for the prediction of aqueous solubility of organic compounds were developed based on a diverse dataset with 2084 compounds by using multi-linear regression analysis and backpropagation neural networks. The compounds were described by two different structure representation methods: (1) with 18 topological descriptors; and (2) with 32 radial distribution function codes representing the 3D structure of a molecule and eight additional descriptors. The dataset was divided into a training and a test set based on Kohonen's self-organizing neural network. Good prediction results were obtained for backpropagation neural network models: with 18 topological descriptors, for the 936 compounds in the test set, a correlation coefficient of 0.92, and a standard deviation of 0.62 were achieved; with 3D descriptors, for the 866 compounds in the test set, a correlation coefficient of 0.90, and a standard deviation of 0.73 were achieved. The models were also tested by using another dataset, and the relationship of the two datasets was examined by Kohonen's self-organizing neural network.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Molecular Structure
Organic Chemicals / chemistry*
Solubility
Water / chemistry

Substances

Organic Chemicals
Water