Prediction models with multiple machine learning algorithms for POPs: The calculation of PDMS-air partition coefficient from molecular descriptor

J Hazard Mater. 2022 Feb 5;423(Pt B):127037. doi: 10.1016/j.jhazmat.2021.127037. Epub 2021 Aug 26.

Abstract

Polydimethylsiloxane-air partition coefficient (KPDMS-air) is a key parameter for passive sampling to measure POPs concentrations. In this study, 13 QSPR models were developed to predict KPDMS-air, with two descriptor selection methods (MLR and RF) and seven algorithms (MLR, LASSO, ANN, SVM, kNN, RF and GBDT). All models were based on a data set of 244 POPs from 13 different categories. The diverse model evaluation parameters calculated from training and test set were used for internal and external verification. Notably, the Radj2, QBOOT2 and Qext2 are 0.995, 0.980 and 0.951 respectively for GBDT model, showing remarkable superiority in fitting, robustness and predictability compared with other models. The discovery that molecular size, branches and types of the bonds were the main internal factors affecting the partition process was revealed by mechanism explanation. Different from the existing QSPR models based on single category compounds, the models developed herein considered multiple classes compounds, so that its application domain was more comprehensive. Therefore, the obtained models can fill the data gap of missing experimental KPDMS-air values for compounds in the application range, and help researchers better understand the distribution behavior of POPs from the perspective of molecular structure.

Keywords: Machine learning algorithms; PDMS-air partition coefficient; POPs; Passive sampling; QSPR.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Linear Models
  • Machine Learning
  • Molecular Structure
  • Quantitative Structure-Activity Relationship*