Machine learning constructs color features to accelerate development of long-term continuous water quality monitoring

J Hazard Mater. 2024 Jan 5:461:132612. doi: 10.1016/j.jhazmat.2023.132612. Epub 2023 Sep 22.

Abstract

Long-term continuous water quality monitoring (LTCM) is crucial to ensure the safety of water resources. However, lab-based pollutant detection via machine learning (ML) usually involves colorimetric materials or sensors, and it cannot be ignored that sensor limitations prevent their use for LTCM. To address this challenge, we propose a novel method that leverages image recognition to establish a relationship between pollutant concentration and color. By extracting efficient color variation features from raw pixel matrices using a combination of Kmeans clustering and RGB average features, the concentrations of pollutants that are difficult to distinguish by the naked eyes can be directly captured without the need for sensors and preprocessing. Four ML models (XGBoost, Linear, support vector regression (SVR), and Ridge) achieved up to a 95.9% increase in coefficient of determination (R2) compared to principal component analysis (PCA). In the prediction of the concentration of simulated pollutants such as Cu2+, Co2+, Rhodamine B, and the concentration of Cr(VI) in actual electroplating wastewater, natural resource water and drinking water, over 95% R2 was achieved. The method reported in our work can effectively capture subtle color changes that cannot be observed by the naked eyes without any preprocessing of water samples, providing a reliable method for LTCM.

Keywords: Image recognition; Kmeans; Long-term continuous water quality monitoring; Machine learning; RGB.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Environmental Pollutants*
  • Machine Learning
  • Water Quality*

Substances

  • Environmental Pollutants