Development and application of machine learning models for prediction of soil available cadmium based on soil properties and climate features

Environ Pollut. 2024 May 10:355:124148. doi: 10.1016/j.envpol.2024.124148. Online ahead of print.

Abstract

Identifying the key influencing factors in soil available cadmium (Cd) is crucial for preventing the Cd accumulation in the food chain. However, current experimental methods and traditional prediction models for assessing available Cd are time-consuming and ineffective. In this study, machine learning (ML) models were developed to investigate the intricate interactions among soil properties, climate features, and available Cd, aiming to identify the key influencing factors. The optimal model was obtained through a combination of stratified sampling, Bayesian optimization, and 10-fold cross-validation. It was further explained through the utilization of permutation feature importance, 2D partial dependence plot, and 3D interaction plot. The findings revealed that pH, surface pressure, sensible heat net flux and organic matter content significantly influenced the Cd accumulation in the soil. By utilizing historical soil surveys and climate change data from China, this study predicted the spatial distribution trend of available Cd in the Chinese region, highlighting the primary areas with heightened Cd activity. These areas were primarily located in the eastern, southern, central, and northeastern China. This study introduces a novel methodology for comprehending the process of available Cd accumulation in soil. Furthermore, it provides recommendations and directions for the remediation and control of soil Cd pollution.

Keywords: Available cadmium; Climate change; Machine learning; Soil properties; Soil remediation.