Improved multivariate modeling for soil organic matter content estimation using hyperspectral indexes and characteristic bands

PLoS One. 2023 Jun 14;18(6):e0286825. doi: 10.1371/journal.pone.0286825. eCollection 2023.

Abstract

Soil organic matter (SOM) is a key index of soil fertility. Calculating spectral index and screening characteristic band reduce redundancy information of hyperspectral data, and improve the accuracy of SOM prediction. This study aimed to compare the improvement of model accuracy by spectral index and characteristic band. This study collected 178 samples of topsoil (0-20 cm) in the central plain of Jiangsu, East China. Firstly, visible and near-infrared (VNIR, 350-2500 nm) reflectance spectra were measured using ASD FieldSpec 4 Std-Res spectral radiometer in the laboratory, and inverse-log reflectance (LR), continuum removal (CR), first-order derivative reflectance (FDR) were applied to transform the original reflectance (R). Secondly, optimal spectral indexes (including deviation of arch, difference index, ratio index, and normalized difference index) were calculated from each type of VNIR spectra. Characteristic bands were selected from each type of spectra by the competitive adaptive reweighted sampling (CARS) algorithm, respectively. Thirdly, SOM prediction models were established based on random forest (RF), support vector regression (SVR), deep neural networks (DNN) and partial least squares regression (PLSR) methods using optimal spectral indexes, denoted here as SI-based models. Meanwhile, SOM prediction models were established using characteristic wavelengths, denoted here as CARS-based models. Finally, this research compared and assessed accuracy of SI-based models and CARS-based models, and selected optimal model. Results showed: (1) The correlation between optimal spectral indexes and SOM was enhanced, with absolute value of correlation coefficient between 0.66 and 0.83. The SI-based models predicted SOM content accurately, with the coefficient of determination (R2) and root mean square error (RMSE) values ranging from 0.80 to 0.87, 2.40 g/kg to 2.88 g/kg in validation sets, and relative percent deviation (RPD) value between 2.14 and 2.52. (2) The accuracy of CARS-based models differed with models and spectral transformations. For all spectral transformations, PLSR and SVR combined with CARS displayed the best prediction (R2 and RMSE values ranged from 0.87 to 0.92, 1.91 g/kg to 2.56 g/kg in validation sets, and RPD value ranged from 2.41 to 3.23). For FDR and CR spectra, DNN and RF models achieved more accuracy (R2 and RMSE values ranged from 0.69 to 0.91, 1.90 g/kg to 3.57 g/kg in validation sets, and RPD value ranged from 1.73 to 3.25) than LR and R spectra (R2 and RMSE values from 0.20 to 0.35, 5.08 g/kg to 6.44 g/kg in validation sets, and RPD value ranged from 0.96 to 1.21). (3) Overall, the accuracy of SI-based models was slightly lower than that of CARS-based models. But spectral index had a good adaptability to the models, and each SI-based model displayed the similar accuracy. For different spectra, the accuracy of CARS-based model differed from modeling methods. (4) The optimal CARS-based model was model CARS-CR-SVR (R2 and RMSE: 0.92 and 1.91 g/kg in validation set, RPD: 3.23). The optimal SI-based model was model SI3-SVR (R2 and RMSE: 0.87 and 2.40 g/kg in validation set, RPD: 2.57) and model SI-SVR (R2 and RMSE: 0.84 and 2.63 g/kg in validation set, RPD: 2.35).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • China
  • Fertility*
  • Laboratories
  • Soil

Substances

  • Soil

Grants and funding

This work was funded by Anhui Provincial Natural Science Foundation, grant number 2208085MD88; the National Natural Science Foundation of China, grant nunber 41501226; Research Fund for Doctoral Program of Anhui University of Science and Technology, grant number ZY020. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.