Prediction models of human plasma protein binding rate and oral bioavailability derived by using GA-CG-SVM method

J Pharm Biomed Anal. 2008 Aug 5;47(4-5):677-82. doi: 10.1016/j.jpba.2008.03.023. Epub 2008 Mar 28.

Abstract

In this study, support vector machine (SVM) method combined with genetic algorithm (GA) for feature selection and conjugate gradient (CG) method for parameter optimization (GA-CG-SVM), has been employed to develop prediction models of human plasma protein binding rate (PPBR) and oral bioavailability (BIO). The advantage of the GA-CG-SVM is that it can deal with feature selection and SVM parameter optimization simultaneously. Five-fold cross-validation as well as independent test set method were used to validate the prediction models. For the PPBR, a total of 692 compounds were used to train and test the prediction model. The prediction accuracy by means of 5-fold cross-validation is 86% and that for the independent test set (161 compounds) is 81%. These accuracies are markedly higher over that of the best model currently available in literature. The number of descriptors selected is 29. For the BIO, the training set is composed of 690 compounds and external 76 compounds form an independent validation set. The prediction accuracy for the training set by using 5-fold cross-validation and that for the independent test set are 80% and 86%, respectively, which are better than or comparable to those of other classification models in literature. The number of descriptors selected is 25. For both the PPBR and BIO, the descriptors selected by GA-CG method cover a large range of molecular properties which imply that the PPBR and BIO of a drug might be affected by many complicated factors.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Biological Availability
  • Blood Proteins / metabolism*
  • Blood Proteins / pharmacokinetics*
  • Humans
  • Kinetics
  • Pattern Recognition, Automated / methods*
  • Pattern Recognition, Automated / statistics & numerical data
  • Predictive Value of Tests
  • Protein Binding
  • Reproducibility of Results
  • Software

Substances

  • Blood Proteins