Integrated machine learning reveals aquatic biological integrity patterns in semi-arid watersheds

J Environ Manage. 2024 May:359:121054. doi: 10.1016/j.jenvman.2024.121054. Epub 2024 May 9.

Abstract

Semi-arid regions present unique challenges for maintaining aquatic biological integrity due to their complex evolutionary mechanisms. Uncovering the spatial patterns of aquatic biological integrity in these areas is a challenging research task, especially under the compound environmental stress. Our goal is to address this issue with a scientifically rigorous approach. This study aims to explore the spatial analysis and diagnosis method of aquatic biological based on the combination of machine learning and statistical analysis, so as to reveal the spatial differentiation patterns and causes of changes of aquatic biological integrity in semi-arid regions. To this end, we have introduced an innovative approach that combines XGBoost-SHAP and Fuzzy C-means clustering (FCM), we successfully identified and diagnosed the spatial variations of aquatic biological integrity in the Wei River Basin (WRB). The study reveals significant spatial variations in species number, diversity, and aquatic biological integrity of phytoplankton, serving as a testament to the multifaceted responses of biological communities under the intricate tapestry of environmental gradients. Delving into the depths of the XGBoost-SHAP algorithm, we discerned that Annual average Temperature (AT) stands as the pivotal driver steering the spatial divergence of the Phytoplankton Integrity Index (P-IBI), casting a positive influence on P-IBI when AT is below 11.8 °C. The intricate interactions between hydrological variables (VF and RW) and AT, as well as between water quality parameters (WT, NO3-N, TP, COD) and AT, collectively sculpt the spatial distribution of P-IBI. The fusion of XGBoost-SHAP with FCM unveils pronounced north-south gradient disparities in aquatic biological integrity across the watershed, segmenting the region into four distinct zones. This establishes scientific boundary conditions for the conservation strategies and management practices of aquatic ecosystems in the region, and its flexibility is applicable to the analysis of spatial heterogeneity in other complex environmental contexts.

Keywords: Aquatic biological integrity; Fuzzy C-Means clustering; Spatial analysis; Wei river basin; XGBoost-SHAP algorithm.

MeSH terms

  • Algorithms
  • Environmental Monitoring / methods
  • Machine Learning*
  • Phytoplankton
  • Rivers