Comparison of Optimal Machine Learning Algorithms for Early Detection of Unknown Hazardous Chemicals in Rivers Using Sensor Monitoring Data

Toxics. 2023 Mar 27;11(4):314. doi: 10.3390/toxics11040314.

Abstract

Water environment pollution due to chemical spills occurs constantly worldwide. When a chemical accident occurs, a quick initial response is most important. In previous studies, samples collected from chemical accident sites were subjected to laboratory-based precise analysis or predictive research through modeling. These results can be used to formulate appropriate responses in the event of chemical accidents; however, there are limitations to this process. For the initial response, it is important to quickly acquire information on chemicals leaked from the site. In this study, pH and electrical conductivity (EC), which are easy to measure in the field, were applied. In addition, 13 chemical substances were selected, and pH and EC data for each were established according to concentration change. The obtained data were applied to machine learning algorithms, including decision trees, random forests, gradient boosting, and XGBoost (XGB), to determine the chemical substances present. Through performance evaluation, the boosting method was found to be sufficient, and XGB was the most suitable algorithm for chemical substance detection.

Keywords: alternative indicator; chemical contamination; chemical detection; initial response; machine learning.