Validated QSAR prediction of OH tropospheric degradation of VOCs: splitting into training-test sets and consensus modeling

J Chem Inf Comput Sci. 2004 Sep-Oct;44(5):1794-802. doi: 10.1021/ci049923u.

Abstract

The rate constant for hydroxyl radical tropospheric degradation of 460 heterogeneous organic compounds is predicted by QSAR modeling. The applied Multiple Linear Regression is based on a variety of theoretical molecular descriptors, selected by the Genetic Algorithms-Variable Subset Selection (GA-VSS) procedure. The models were validated for predictivity by both internal and external validation. For the external validation two splitting approaches, D-optimal Experimental Design and Kohonen Artificial Neural Networks (K-ANN), were applied to the original data set to compare the two methodologies. We emphasize that external validation is the only way to establish a reliable QSAR model for predictive purposes. Predicted data by consensus modeling from different models are also proposed.