Predicting formation of haloacetic acids by chlorination of organic compounds using machine-learning-assisted quantitative structure-activity relationships

J Hazard Mater. 2021 Apr 15:408:124466. doi: 10.1016/j.jhazmat.2020.124466. Epub 2020 Nov 9.

Abstract

The presence of disinfection byproducts (DBPs) in drinking water is a major public health concern, and an effective strategy to limit the formation of these DBPs is to prevent their precursors. In silico prediction from chemical structure would allow rapid identification of precursors and could be used as a prescreening tool to prioritize testing. We present models using machine learning algorithms (i.e., support vector regressor, random forest regressor, and multilayer perceptron regressor) and chemical descriptors as features to predict the formation of haloacetic acids (HAAs). A robust model with good predictivity (i.e., leave-one-out cross-validated Q2 > 0.5) to predict the formation of trichloroacetic acid (TCAA) was developed using a random forest regressor. The number of aromatic bonds, hydrophilicity, and electrotopological descriptors related to electrostatic interactions and the atomic distribution of electronegativity were identified as important predictors of TCAA formation potentials (FPs). However, the prediction of dichloroacetic acid was less accurate, which is congruent with the presence of different types of precursors exhibiting distinct mechanisms. This study demonstrates that nonlinear combinations of general chemical descriptors can adequately estimate HAAFPs, and we hope that our study can be used to predict precursors of other disinfection byproducts based on chemical structures using a similar workflow.

Keywords: Anthropogenic compounds; Haloacetic acids; Machine-learning; Pollutant Release and Transfer Register; QSAR.

Publication types

  • Research Support, Non-U.S. Gov't