Predicting the aquatic toxicity mode of action using logistic regression and linear discriminant analysis

SAR QSAR Environ Res. 2016 Sep;27(9):721-46. doi: 10.1080/1062936X.2016.1229691. Epub 2016 Sep 21.

Abstract

The paper highlights the use of the logistic regression (LR) method in the construction of acceptable statistically significant, robust and predictive models for the classification of chemicals according to their aquatic toxic modes of action. Essentials accounting for a reliable model were all considered carefully. The model predictors were selected by stepwise forward discriminant analysis (LDA) from a combined pool of experimental data and chemical structure-based descriptors calculated by the CODESSA and DRAGON software packages. Model predictive ability was validated both internally and externally. The applicability domain was checked by the leverage approach to verify prediction reliability. The obtained models are simple and easy to interpret. In general, LR performs much better than LDA and seems to be more attractive for the prediction of the more toxic compounds, i.e. compounds that exhibit excess toxicity versus non-polar narcotic compounds and more reactive compounds versus less reactive compounds. In addition, model fit and regression diagnostics was done through the influence plot which reflects the hat-values, studentized residuals, and Cook's distance statistics of each sample. Overdispersion was also checked for the LR model. The relationships between the descriptors and the aquatic toxic behaviour of compounds are also discussed.

Keywords: CODESSA; DRAGON; Logistic regression; influence plot; linear discriminant analysis; modes of aquatic toxic action.

MeSH terms

  • Discriminant Analysis
  • Linear Models
  • Logistic Models
  • Quantitative Structure-Activity Relationship*
  • Reproducibility of Results
  • Software
  • Water Pollutants, Chemical / toxicity*

Substances

  • Water Pollutants, Chemical