Using process-oriented model output to enhance machine learning-based soil organic carbon prediction in space and time

Sci Total Environ. 2024 Apr 20:922:170778. doi: 10.1016/j.scitotenv.2024.170778. Epub 2024 Feb 8.

Abstract

Monitoring and modelling soil organic carbon (SOC) in space and time can help us to better understand soil carbon dynamics and is of key importance to support climate change research and policy. Although machine learning (ML) has attracted a lot of attention in the digital soil mapping (DSM) community for its powerful ability to learn from data and predict soil properties, such as SOC, it is better at capturing soil spatial variation than soil temporal dynamics. By contrast, process-oriented (PO) models benefit from mechanistic knowledge to express physiochemical and biological processes that govern SOC temporal changes. Therefore, integrating PO and ML models seems a promising means to represent physically plausible SOC dynamics while retaining the spatial prediction accuracy of ML models. In this study, a hybrid modelling framework was developed and tested for predicting topsoil SOC stock in space and time for a regional cropland area located in eastern China. In essence, the hybrid model uses predictions of the PO model in unsampled years as additional training data of the ML model, with a weighting parameter assigned to balance the importance of SOC values from the PO model and real measurements. The results indicated that temporal trends of SOC stock modelled by PO and ML models were largely different, while they were notably similar between the PO and hybrid models. Cross-validation showed that the hybrid model had the best performance (RMSE = 0.29 kg m-2), with a 19 % improvement compared with the ML model. We conclude that the proposed hybrid framework not only enhances space-time soil carbon mapping in terms of prediction accuracy and physical plausibility, it also provides insights for soil management and policy decisions in the face of future climate change and intensified human activities.

Keywords: Digital soil mapping; Hybrid modelling; Mechanistic knowledge-guided machine learning; Random forest; RothC; Soil carbon dynamics.