Towards interpretable machine learning for observational quantification of soil heavy metal concentrations under environmental constraints

Sci Total Environ. 2024 May 20:926:171931. doi: 10.1016/j.scitotenv.2024.171931. Epub 2024 Mar 24.

Abstract

Monitoring heavy metal concentrations in soils is central to assessing agricultural production safety. Satellite observations permit inferring concentrations from spectrum, thereby contributing to the prevention and control of soil heavy metal pollution. However, heavy metals exhibit weak spectral responses, particularly at low and medium concentrations, and are predominantly influenced by other soil components. Machine learning (ML)-driven modelling can produce predictions but lacks interpretability. Here, we present an interpretable ML framework for concentration quantification modelling and investigated the contributions of spectral and environmental factors-pH and organic carbon-to the estimation of metals with multiple concentration gradients, as analysed through SHAP (SHapley Additive exPlanations) data derived from four learning-based scenarios. The results indicated that scenarios SHC (spectral, pH, and organic carbon) and SH (spectral and pH) were the most optimal for chromium (Cr) [RPD = 1.42, Adj R2 = 0.62], and cadmium (Cd) [RPD = 1.80, Adj R2 = 0.80]. Under environmental constraints, the spectral predictability for Cr and Cd was improved by 67 % and 87 %, respectively. We concluded that interpretable modelling, utilising both spectral and soil environmental factors, holds significant potential for estimating heavy metals across concentration gradients. It is recommended that samples with higher organic carbon content and lower pH be selected to enhance Cr and Cd predictions. An advanced grasp of interpretable predictions facilitates earlier warning of heavy metal contamination and guides the formulation of robust sampling strategies.

Keywords: Hyperspectral; Interpretability; Pollutants; Prediction; Remote sensing; Satellite.