Would it be better for partition prediction of heavy metal concentration in soils based on the fusion of XRF and Vis-NIR data?

Sci Total Environ. 2024 Jan 15:908:168381. doi: 10.1016/j.scitotenv.2023.168381. Epub 2023 Nov 10.

Abstract

Heavy metal (HM) contamination in soil necessitates effective methods to diagnose suspected contaminated areas and control rehabilitation processes. The synergistic use of proximal sensors demonstrates significant potential for rapid detection via accurate surveys of soil HM pollution at large scales and high sampling densities, and necessitates the selection of appropriate data mining and modeling methods for early diagnosis of soil pollution. The aim of this study is to evaluate the performance of a subarea model based on geographically partitioned and global models based on high-precision energy dispersive X-ray fluorescence (HD-XRF) and visible near-infrared (vis-NIR) spectra using a random forest model for predicting soil Cu and Pb concentrations. A total of 166 soil samples are acquired from a contaminated plot in Baiyin, Gansu Province, China. The soil samples are subjected to HM analysis and proximal sensor scanning in a laboratory. Vis-NIR spectral data are preprocessed using the Savitzky Golay (SG) and first-order derivative with Savitzky Golay (SGFD) methods. The results show that for predicting Cu and Pb concentrations in soil, the subarea models performs better than the global models in terms of quantitative prediction, based solely on individual HD-XRF data. For the subarea and global models, the R2 values are 0.961 and 0.981, respectively; the RMSE values are 27.8 and 79.6, respectively; and the RPD values are 4.96 and 7.38, respectively. However, making use of the random forest algorithm trained with data fusion obtained from the HD-XRF and vis-NIR sensors, the global model achieves the best predictions for Cu and Pb concentrations via HD-XRF + vis-NIR (SGFD) and HD-XRF + vis-NIR (SG), respectively. The results will provide a new perspective for modeling approaches to rapidly invert HM concentrations based on proximal sensor data fusion within a large scope of the study area.

Keywords: Accurate investigation; Hyperspectral inversion; Machine learning; Rapid detection; Subarea modeling.