Comparison of predictive ability of water solubility QSPR models generated by MLR, PLS and ANN methods

Mini Rev Med Chem. 2004 Feb;4(2):167-77. doi: 10.2174/1389557043487466.

Abstract

ADME/Tox computational screening is one of the most hot topics of modern drug research. About one half of the potential drug candidates fail because of poor ADME/Tox properties. Since the experimental determination of water solubility is time-consuming also, reliable computational predictions are needed for the pre-selection of acceptable "drug-like" compounds from diverse combinatorial libraries. Recently many successful attempts were made for predicting water solubility of compounds. A comprehensive review of previously developed water solubility calculation methods is presented here, followed by the description of the solubility prediction method designed and used in our laboratory. We have selected carefully 1381 compounds from scientific publications in a unified database and used this dataset in the calculations. The externally validated models were based on calculated descriptors only. The aim of model optimization was to improve repeated evaluations statistics of the predictions and effective descriptor scoring functions were used to facilitate quick generation of multiple linear regression analysis (MLR), partial least squares method (PLS) and artificial neural network (ANN) models with optimal predicting ability. Standard error of prediction of the best model generated with ANN (with 39-7-1 network structure) was 0.72 in logS units while the cross validated squared correlation coefficient (Q(2)) was better than 0.85. These values give a good chance for successful pre-selection of screening compounds from virtual libraries, based on the predicted water solubility.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chemical Phenomena
  • Chemistry, Physical
  • Least-Squares Analysis*
  • Linear Models*
  • Models, Chemical
  • Models, Molecular
  • Models, Statistical
  • Neural Networks, Computer*
  • Quantitative Structure-Activity Relationship
  • Reproducibility of Results
  • Solubility*
  • Water / chemistry*

Substances

  • Water