Spatial heterogeneity modeling of water quality based on random forest regression and model interpretation

Environ Res. 2021 Nov:202:111660. doi: 10.1016/j.envres.2021.111660. Epub 2021 Jul 12.

Abstract

A systematic understanding of the spatial distribution of water quality is critical for successful watershed management; however, the limited number of physical monitoring stations has restricted the evaluation of spatial water quality distribution and the identification of features impacting the water quality. To fill this gap, we developed a modeling process that employed the random forest regression (RFR) to model the water quality distribution for the Taihu Lake basin in Zhejiang Province, China, and adopted the Shapley Additive exPlanations (SHAP) method to interpret the underlying driving forces. We first used RFR to model three water quality parameters: permanganate index (CODMn), total phosphorus (TP), and total nitrogen (TN), based on 16 watershed features. We then applied the built models to generate water quality distribution maps for the basin, with the CODMn ranging from 1.39 to 6.40 mg/L, TP from 0.02 to 0.23 mg/L, and TN from 1.43 to 4.27 mg/L. These maps showed generally consistent patterns among the CODMn, TN, and TP with minor differences in the spatial distribution. The SHAP analysis showed that the TN was mainly affected by agricultural non-point sources, while the CODMn and TP were affected by agricultural and domestic sources. Due to differences in sewage collection and treatment between urban and rural areas, the water quality in highly populated urban areas was better than that in rural areas, which led to an unexpected positive relationship between water quality and population density. Overall, with the RFR models and SHAP interpretation, we obtained a continuous distribution pattern of the water quality and identified its driving forces in the basin. These findings provided important information to assist water quality restoration projects.

Keywords: Driving force analysis; Machine learning; Random forest regression; Shapley additive explanations; Water quality assessment.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • China
  • Environmental Monitoring
  • Lakes
  • Nitrogen / analysis
  • Phosphorus / analysis
  • Rivers
  • Water Pollutants, Chemical* / analysis
  • Water Quality*

Substances

  • Water Pollutants, Chemical
  • Phosphorus
  • Nitrogen