Prediction of the concentration of cadmium in agricultural soil in the Czech Republic using legacy data, preferential sampling, Sentinel-2, Landsat-8, and ensemble models

J Environ Manage. 2023 Mar 15:330:117194. doi: 10.1016/j.jenvman.2022.117194. Epub 2023 Jan 3.

Abstract

The current study assesses and predicts cadmium (Cd) concentration in agricultural soil using two Cd datasets, namely legacy data (LD) and preferential sampling-legacy data (PS-LD), along with four streams of auxiliary datasets extracted from Sentinel-2 (S2) and Landsat-8 (L8) bands. The study was divided into two contexts: Cd prediction in agricultural soil using LD, ensemble models, 10 and 20 m spatial resolution of S2 and L8 (context 1), and Cd prediction in agricultural soil using PS-LD, ensemble models and 10 and 20 m spatial resolution of S2 and L8 (context 2). In context 1, ensemble 1, L8 with PS-LD was the cumulative optimal approach that predicted Cd in agricultural soil with a higher R2 value of 0.76, root mean square error (RMSE) of 0.66, mean absolute error (MAE) of 0.35, and median absolute error (MdAE) of 0.13. However, with R2 = 0.78, RMSE = 0.63, MAE = 0.34, and MdAE = 0.15, ensemble 1, S2 of PS-LD was the best prediction approach in predicting Cd concentration in agricultural soil in context 2. Overall, the predictions from both contexts indicated that ensemble 1 of S2 combined with PS-LD was the most appropriate and best model for Cd prediction in agricultural soil. The modeling approaches' uncertainty in both contexts was assessed using ensemble-sequential gaussian simulation (EnSGS), which revealed that the degree of uncertainty propagated in the study area was within 5% in both contexts. The combination of the PS dataset and the LD along with ensemble models and the remote sensing dataset, produced promising results. Nonetheless, the results demonstrated that the 20 m spatial resolution band dataset used in the prediction of Cd in agricultural soil outperformed the 10 m spatial resolution. When PS is combined with LD, an appropriate modeling approach, and a well-correlated remote sensing dataset are used, good results are obtained.

Keywords: Ensemble models; Legacy data; Preferential sampling; Remote sensing; Uncertainty assessment.

MeSH terms

  • Cadmium
  • Czech Republic
  • Environmental Monitoring / methods
  • Soil Pollutants* / analysis
  • Soil*

Substances

  • Soil
  • Cadmium
  • Soil Pollutants