Development of models predicting biodegradation rate rating with multiple linear regression and support vector machine algorithms

Chemosphere. 2020 Aug:253:126666. doi: 10.1016/j.chemosphere.2020.126666. Epub 2020 Apr 4.

Abstract

Biodegradation is a significant process for removing organic chemicals from water, soil and sediment environments, and therefore biodegradability is critical to evaluate the environmental persistence of organic chemicals. In this study, based on a dataset with 171 compounds, four quantitative structure-activity relationship (QSAR) models were developed for predicting primary and ultimate biodegradation rate rating with multiple linear regression (MLR) and support vector machine (SVM) algorithms. Two MLR models were built with a dataset with carbon atom number ≤9, and two SVM models were built with a dataset with carbon atom number >9. In the MLR models, nArX (number of X on aromatic ring) is the most important descriptor governing primary and ultimate biodegradation of organic chemicals. For the SVM models, determination coefficient (R2) values, cross-validated coefficients (Q2LOO) and external validation coefficient (Q2ext) values are over 0.9, indicating the SVM models have satisfactory goodness-of-fit, robustness and external predictive abilities. The applicability domains of these models were visualized by the Williams plot. The developed models can be used as effective tools to predict biodegradability of organic chemicals.

Keywords: Biodegradability; Molecular structure descriptors; Multiple linear regression; Quantitative structure–activity relationship; Support vector machine.

MeSH terms

  • Algorithms*
  • Biodegradation, Environmental*
  • Carbon
  • Linear Models*
  • Organic Chemicals / chemistry
  • Quantitative Structure-Activity Relationship
  • Soil
  • Support Vector Machine
  • Water / chemistry

Substances

  • Organic Chemicals
  • Soil
  • Water
  • Carbon