Ensemble learning-based applied research on heavy metals prediction in a soil-rice system

Sci Total Environ. 2023 Nov 10:898:165456. doi: 10.1016/j.scitotenv.2023.165456. Epub 2023 Jul 13.

Abstract

Accurate prediction of heavy metal accumulation in soil ecosystems is crucial for maintaining healthy soil environments and ensuring high-quality agricultural products, as well as a challenging scientific task. In this study, we constructed a dataset containing 490 sets of multidimensional environmental covariate data and proposed prediction models for heavy metal concentrations (HMC) in a soil-rice system, EL-HMC (including RF-HMC and GBM-HMC), based on Random Forest (RF) and Gradient Boosting Machine (GBM) ensemble learning (EL) techniques. To reasonably evaluate the effectiveness of each model, Multiple linear and Bayesian regressions were selected as benchmark models (BM), and mean absolute error (MAE), root mean square error (RMSE), and determination coefficient R2 were selected as evaluation indicators. In addition, sensitivity and spatial autocorrelation (SAC) analyses were used to examine the robustness of the model. The results showed that the R2 values of RF-HMC and GBM-HMC for modeling available cadmium (Cd) concentrations in soil were 0.654 and 0.690, respectively, with an average increase of 48.0 % compared to the BMs. The R2 values of RF-HMC and GBM-HMC for predicting Cd, lead (Pb), chromium (Cr), and mercury (Hg) concentrations in rice ranged from 0.618 to 0.824 and 0.645 to 0.850, respectively, with an average increase of 58.2 % compared with the BMs. The corresponding MAEs and RMSEs of RF-HMC and GBM-HMC had low error levels. Sensitivity analysis of the input features and the SAC of the prediction bias showed that the EL-HMC models have excellent robustness. Therefore, the EL technology-based prediction models for HMCs proposed herein are practical and feasible, demonstrating better accuracy and stability than the traditional model. This study verifies the application potential of EL technology in pollution ecology and provides a new perspective and solution for sustainable management and precise prevention of heavy metal pollution in farmland soil at the regional scale.

Keywords: Environmental factor; Gradient boosting machine; Random forest; Sensitivity; Spatial pattern.

MeSH terms

  • Bayes Theorem
  • Cadmium / analysis
  • China
  • Ecosystem
  • Environmental Monitoring / methods
  • Machine Learning
  • Mercury* / analysis
  • Metals, Heavy* / analysis
  • Oryza*
  • Risk Assessment
  • Soil
  • Soil Pollutants* / analysis

Substances

  • Soil
  • Cadmium
  • Soil Pollutants
  • Metals, Heavy
  • Mercury