Estimation of soil organic matter content based on CARS algorithm coupled with random forest

Spectrochim Acta A Mol Biomol Spectrosc. 2021 Sep 5:258:119823. doi: 10.1016/j.saa.2021.119823. Epub 2021 Apr 20.

Abstract

Soil organic matter (SOM) is an important index used to evaluate soil fertility and nutrient availability, and it is also an important component of precision agriculture. In this study, in order to quickly and efficiently estimate the SOM content of farmland soil, we took 190 farmland soil samples in Jingbian County and measureed the SOM content of the samples in the lab and collected the corresponding Vis-NIR spectroscopy data. Based on the six pretreatment methods, a competitive adaptive weighting algorithm (CARS) is used for characteristic wavelength selection. Random forest (RF) regression is used to establish the predictive SOM model. The results indicate that after the CARS algorithm screens the different spectral variables, the optimal variable sets of the seven spectral variables are 15, 40, 30, 23, 20, 26, and 23, respectively. The accuracy of the model is improved after the CARS algorithm screens the different spectral variables. A total of 15 characteristic variables from the 2151 spectral wavelengths were used as the optimal spectral variable subset; RF shortened the training time required during the SOM modeling process and dramatically improved the model's accuracy and predictive ability, and the R2 of the validation set increased from 0.21 to 0.96, and the RPD increased from 0.46 to 3.02. The RPIQ increased from 1.25 to 4.41. Among the tested models, the CR-RF model produced the best results. The R2 and RMSE values of the calibration set are 0.91 and 0.49, and the R2, RMSE, RPD, and RPIQ values of the validation set are 0.96, 0.51, 3.02, and 4.41, respectively. Accurate prediction of the SOM of the cultivated layer in the study area was realized.

Keywords: CARS algorithm; Characteristic wavelength selection; Random forest; Soil organic matter; Vis-NIR spectroscopy.