Integrative environmental modeling of soil carbon fractions based on a new latent variable model approach

Sci Total Environ. 2020 Apr 1:711:134566. doi: 10.1016/j.scitotenv.2019.134566. Epub 2019 Nov 20.

Abstract

Soil-environmental correlation has been extensively studied as a cost-effective method for regional-scale soil attribute modeling. However, the limitations of commonly used statistical methods in soil-factorial modeling entail multicollinearity in bigdata soil-factorial prediction data and mixed type of soil-environmental variables (categorical and continuous). Both of these shortcomings were addressed resulting in a new soil-factorial modeling approach. The objective of this study was to develop a novel statistical technique for factorial modeling of topsoil soil total (TC), organic (SOC), recalcitrant (RC), moderately-available (MC), and hot-water extractable carbon (HC) in Florida. This article introduced a two-step regression technique (2Step-R) combining linear regressions (i.e., Ridge Regression-RR and Bayesian Linear Regression) and latent variable models (i.e., Partial Least Squares Regression-PLSR and Sparse Bayesian Infinite Factor-SBIF) for the integration of mixed type soil-environmental datasets. Results of this research showed the new technique capabilities to derive acceptable models for TC, SOC, RC, and MC predictions (R2 > 0.65; residual prediction deviation, RPD > 1.6), but fair for HC prediction (R2 ≤ 0.60; RPD ≤ 1.6). This novel method improved TC, SOC, and MC prediction accuracies compared with standard PLSR and RR methods. In conclusion, the new modeling approach that incorporates categorical along with continuous soil-environmental predictor variables in latent variable models has profound potential to improve soil attribute predictions in other regions.

Keywords: Latent variable modeling; Regression model; Soil carbon; Soil-environmental modeling.