Linear and nonlinear functions on modeling of aqueous solubility of organic compounds by two structure representation methods

J Comput Aided Mol Des. 2004 Feb;18(2):75-87. doi: 10.1023/b:jcam.0000030031.81235.05.

Abstract

Several quantitative models for the prediction of aqueous solubility of organic compounds were developed based on a diverse dataset with 2084 compounds by using multi-linear regression analysis and backpropagation neural networks. The compounds were described by two different structure representation methods: (1) with 18 topological descriptors; and (2) with 32 radial distribution function codes representing the 3D structure of a molecule and eight additional descriptors. The dataset was divided into a training and a test set based on Kohonen's self-organizing neural network. Good prediction results were obtained for backpropagation neural network models: with 18 topological descriptors, for the 936 compounds in the test set, a correlation coefficient of 0.92, and a standard deviation of 0.62 were achieved; with 3D descriptors, for the 866 compounds in the test set, a correlation coefficient of 0.90, and a standard deviation of 0.73 were achieved. The models were also tested by using another dataset, and the relationship of the two datasets was examined by Kohonen's self-organizing neural network.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Molecular Structure
  • Organic Chemicals / chemistry*
  • Solubility
  • Water / chemistry

Substances

  • Organic Chemicals
  • Water