Trihalomethane prediction model for water supply system based on machine learning and Log-linear regression

Environ Geochem Health. 2024 Jan 16;46(2):31. doi: 10.1007/s10653-023-01778-3.

Abstract

Laboratory determination of trihalomethanes (THMs) is a very time-consuming task. Therefore, establishing a THMs model using easily obtainable water quality parameters would be very helpful. This study explored the modeling methods of the random forest regression (RFR) model, support vector regression (SVR) model, and Log-linear regression model to predict the concentration of total-trihalomethanes (T-THMs), bromodichloromethane (BDCM), and dibromochloromethane (DBCM), using nine water quality parameters as input variables. The models were developed and tested using a dataset of 175 samples collected from a water treatment plant. The results showed that the RFR model, with the optimal parameter combination, outperformed the Log-linear regression model in predicting the concentration of T-THMs (N25 = 82-88%, rp = 0.70-0.80), while the SVR model performed slightly better than the RFR model in predicting the concentration of BDCM (N25 = 85-98%, rp = 0.70-0.97). The RFR model exhibited superior performance compared to the other two models in predicting the concentration of T-THMs and DBCM. The study concludes that the RFR model is superior overall to the SVR model and Log-linear regression models and could be used to monitor THMs concentration in water supply systems.

Keywords: Disinfection by-products; Machine learning algorithms; Predictive models; Random forest; Support vector regression.

MeSH terms

  • Linear Models
  • Machine Learning
  • Trihalomethanes
  • Water Quality*
  • Water Supply*

Substances

  • bromodichloromethane
  • chlorodibromomethane
  • Trihalomethanes