Indices of the presence of atoms (IPA) encode the presence or absence of atoms, such as nitrogen, oxygen, sulphur, phosphorus, fluorine, chlorine, and bromine in a molecule. They are calculated with the simplified molecular input line entry system (SMILES). Using the Monte Carlo method for correlation weights of these indices, one can improve the predictive ability of optimal SMILES-based descriptors in quantitative structure-activity relationships (QSAR) for bioconcentration factor. The model without IPA gave the following results: n=503, r(2)=0.6803, q(2)=0.6781, s=0.759, F=1066 (subtraining set); n=322, r(2)=0.8181, r(pred)(2)=0.8159, s=0.565, F=1439 (calibration set); n=105, r(2)=0.6703, r(pred)(2)=0.6577, R(m)(2)=0.6628, s=0.728, F=209 (test set); n=106, r(2)=0.6624, r(pred)(2)=0.6502, R(m)(2)=0.6212, s=0.757, F=204 (validation set) The model with IPA gave: n=503, r(2)=0.7082, q(2)=0.7062, s=0.725, F=1216 (subtraining set); n=322, r(2)=0.8401, r(pred)(2)=0.8383, s=0.528, F=1682 (calibration set); n=105, r(2)=0.7489, r(pred)(2)=0.7402, R(m)(2)=0.7252, s=0.637, F=307 (test set); n=106, r(2)=0.7306, r(pred)(2)=0.7217, R(m)(2)=0.7010, s=0.680, F=282 (validation set).
2010 Elsevier Masson SAS. All rights reserved.