Combinatorial QSAR modeling of P-glycoprotein substrates

J Chem Inf Model. 2006 May-Jun;46(3):1245-54. doi: 10.1021/ci0504317.

Abstract

Quantitative structure-activity (property) relationship (QSAR/QSPR) models are typically generated with a single modeling technique using one type of molecular descriptors. Recently, we have begun to explore a combinatorial QSAR approach which employs various combinations of optimization methods and descriptor types and includes rigorous and consistent model validation (Kovatcheva, A.; Golbraikh, A.; Oloff, S.; Xiao, Y.; Zheng, W.; Wolschann, P.; Buchbauer, G.; Tropsha, A. Combinatorial QSAR of Ambergris Fragrance Compounds. J. Chem. Inf. Comput. Sci. 2004, 44, 582-95). Herein, we have applied this approach to a data set of 195 diverse substrates and nonsubstrates of P-glycoprotein (P-gp) that plays a crucial role in drug resistance. Modeling methods included k-nearest neighbors classification, decision tree, binary QSAR, and support vector machines (SVM). Descriptor sets included molecular connectivity indices, atom pair (AP) descriptors, VolSurf descriptors, and molecular operation environment descriptors. Each descriptor type was used with every QSAR modeling technique; so, in total, 16 combinations of techniques and descriptor types have been considered. Although all combinations resulted in models with a high correct classification rate for the training set (CCR(train)), not all of them had high classification accuracy for the test set (CCR(test)). Thus, predictive models have been generated only for some combinations of the methods and descriptor types, and the best models were obtained using SVM classification with either AP or VolSurf descriptors; they were characterized by CCR(train) = 0.94 and 0.88 and CCR(test) = 0.81 and 0.81, respectively. The combinatorial QSAR approach identified models with higher predictive accuracy than those reported previously for the same data set. We suggest that, in the absence of any universally applicable "one-for-all" QSAR methodology, the combinatorial QSAR approach should become the standard practice in QSPR/QSAR modeling.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • ATP Binding Cassette Transporter, Subfamily B, Member 1 / metabolism*
  • Combinatorial Chemistry Techniques*
  • Decision Trees
  • Models, Molecular*
  • Protein Binding
  • Quantitative Structure-Activity Relationship

Substances

  • ATP Binding Cassette Transporter, Subfamily B, Member 1