Deep optimization of water quality index and positive matrix factorization models for water quality evaluation and pollution source apportionment using a random forest model

Environ Pollut. 2024 Apr 15:347:123771. doi: 10.1016/j.envpol.2024.123771. Epub 2024 Mar 15.

Abstract

Effective evaluation of water quality and accurate quantification of pollution sources are essential for the sustainable use of water resources. Although water quality index (WQI) and positive matrix factorization (PMF) models have been proven to be applicable for surface water quality assessments and pollution source apportionments, these models still have potential for further development in today's data-driven, rapidly evolving technological era. This study coupled a machine learning technique, the random forest model, with WQI and PMF models to enhance their ability to analyze water pollution issues. Monitoring data of 12 water quality indicators from six sites along the Minjiang River from 2015 to 2020 were used to build a WQI model for determining the spatiotemporal water quality characteristics. Then, coupled with the random forest model, the importance of 12 indicators relative to the WQI was assessed. The total phosphorus (TP), total nitrogen (TN), chemical oxygen demand (CODCr), dissolved oxygen (DO), and five-day biochemical oxygen demand (BOD5) were identified as the top five significant parameters influencing water quality in the region. The improved WQI model constructed based on key parameters enabled high-precision (R2 = 0.9696) water quality prediction. Furthermore, the feature importance of the indicators was used as weights to adjust the results of the PMF model, allowing for a more reasonable pollutant source apportionment and revealing potential driving factors of variations in water quality. The final contributions of pollution sources in descending order were agricultural activities (30.26%), domestic sewage (29.07%), industrial wastewater (26.25%), seasonal factors (6.45%), soil erosion (6.19%), and unidentified sources (1.78%). This study provides a new perspective for a comprehensive understanding of the water pollution characteristics of rivers, and offers valuable references for the development of targeted strategies for water quality improvement.

Keywords: Machine learning; Positive matrix factorization model; River contaminants; Source apportionment; Water quality evaluation.

MeSH terms

  • China
  • Environmental Monitoring / methods
  • Random Forest
  • Rivers
  • Water Pollutants, Chemical* / analysis
  • Water Pollution / analysis
  • Water Quality*

Substances

  • Water Pollutants, Chemical