Multiple machine learning algorithms assisted QSPR models for aqueous solubility: Comprehensive assessment with CRITIC-TOPSIS

Sci Total Environ. 2023 Jan 20;857(Pt 2):159448. doi: 10.1016/j.scitotenv.2022.159448. Epub 2022 Oct 14.

Abstract

As an essential environmental property, the aqueous solubility quantifies the hydrophobicity of a compound. It could be further utilized to evaluate the ecological risk and toxicity of organic pollutants. Concerned about the proliferation of organic contaminants in water and the associated technical burden, researchers have developed QSPR models to predict aqueous solubility. However, there are no standard procedures or best practices on how to comprehensively evaluate models. Hence, the CRITIC-TOPSIS comprehensive assessment method was first-ever proposed according to a variety of statistical parameters in the environmental model research field. 39 models based on 13 ML algorithms (belonged to 4 tribes) and 3 descriptor screening methods, were developed to calculate aqueous solubility values (log Kws) for organic chemicals reliably and verify the effectiveness of the comprehensive assessment method. The evaluations were carried out for exhibiting better predictive accuracy and external competitiveness of the MLR-1, XGB-1, DNN-1, and kNN-1 models in contrast to other prediction models in each tribe. Further, XGB model based on SRM (XGB-1, C = 0.599) was selected as an optimal pathway for prediction of aqueous solubility. We hope that the proposed comprehensive evaluation approach could act as a promising tool for selecting the optimum environmental property prediction methods.

Keywords: Aqueous solubility; Comprehensive evaluation; Descriptor screening methods; Machine learning; Organic contaminants.

MeSH terms

  • Algorithms*
  • Machine Learning
  • Quantitative Structure-Activity Relationship*
  • Solubility
  • Water / chemistry

Substances

  • Water