Accuracy and uncertainty of geostatistical models versus machine learning for digital mapping of soil calcium and potassium

Environ Monit Assess. 2022 Sep 10;194(10):760. doi: 10.1007/s10661-022-10434-9.

Abstract

Accuracy and uncertainty of models used for digital soil mapping are important for assessing confidence of predictions and reliable land use planning and management. In this study, two approaches of geostatistical (spatial) and machine learning (ML) models were evaluated for predictive mapping of soil calcium (Ca) and potassium (K). Two spatial models including empirical Bayesian kriging (EBK) and sequential Gaussian simulation (SGS) were compared with machine learning models: Cubist, random forest (RF) and support vector machine (SVM) in terms of their accuracy and uncertainty for mapping soil Ca and K. The study area is in Nowley, New South Wales, Australia, with an area of 2083 ha and a variety of soil types and farming systems. For the models training process, 240 soil samples data and for validation 102 independent samples data were used. For accuracy assessment R2, root mean square error (RMSE), concordance and bias and for uncertainty assessment confidence limits were investigated. Also, in order to compare the outcomes for the two soil properties with different measurement units, mean absolute percentage error (MAPE) and relative uncertainty (RU) as accuracy and uncertainty measures, respectively, were evaluated. Results showed that for K map SGS had the highest R2 (0.74) and lowest RMSE (1.96), followed by EBK with R2 = 0.72 and RMSE = 2.02. For Ca map, EBK model showed the highest accuracy (R2 = 0.46; RMSE = 3.21), followed by SVM and SGS with comparable accuracies. Comparing the two soil properties, Ca map showed higher MAPE and RU, compared to K map. The lowest MAPE was obtained for EBK model (for K = 39) and SGS model (for K = 44). Also, the lowest RU values were found for EBK and SGS models. Among the ML models, SVM showed lower sensitivity to higher variance in data input. In general, the spatial models outperformed the ML models with regard to both accuracy and uncertainty. An additional conclusion is that considering the data variance in the two soil properties, geostatistical models with lower RU and MAPE were relatively less susceptible to data variance, compared to ML models.

Keywords: Confidence limits; Digital soil mapping; Empirical bayesian kriging; Random forest; Sequential gaussian simulation; Support vector machine.

MeSH terms

  • Bayes Theorem
  • Calcium*
  • Environmental Monitoring / methods
  • Machine Learning
  • Potassium
  • Soil*
  • Uncertainty

Substances

  • Soil
  • Potassium
  • Calcium