Rapid identification of soil organic matter level via visible and near-infrared spectroscopy: Effects of two-dimensional correlation coefficient and extreme learning machine

Sci Total Environ. 2018 Dec 10:644:1232-1243. doi: 10.1016/j.scitotenv.2018.06.319. Epub 2018 Jul 13.

Abstract

Accurate estimation of soil organic matter (SOM) is essential in understanding the spatial distribution of SOM to identify areas that need fertilization and the required grade of those fertilizers. Visible and near-infrared spectroscopy is a promising alternative to time consuming and costly conventional soil assessment methods. However, this approach is highly dependent on selecting suitable preprocessing strategies and data mining techniques for regression analysis. In this study, 2D correlation coefficients, including ratio, difference, and normalized difference indices, were introduced to select sensitive spectral parameters. The performance of extreme learning machine (ELM) was evaluated via comparison with that of support vector machine (SVM) for SOM estimation. A total of 257 soil samples were collected from Hubei Province, Central China, with SOM contents and reflectance spectra measured in the laboratory. Five spectral pretreatments, except for the raw spectra, were applied. SVM and ELM models were calibrated on spectral parameters selected by one-dimensional and 2D correlation coefficients and subsequently applied to predict SOM. Results showed that 2D correlation coefficient can effectively highlight the detailed SOM information compared with that of one-dimensional correlation coefficient. The ELM models yielded superior predictability relative to SVM models in all eight established models. The most excellent estimation accuracy was obtained by 2D ratio index and ELM (TRI-ELM) method, with an independent validation R2 and a ratio of performance to interquartile range of 0.83 and 3.49, respectively. The SOM fertility levels of predicted SOM showed that TRI-ELM method presented the largest similarity to laboratory-measured SOM levels, and misclassified samples were all concentrated within one error level. In summary, our study indicates that the TRI-ELM model is a rapid, inexpensive, and relatively accurate method for identifying SOM fertility level.

Keywords: Correlation coefficient; Machine learning model; Remote sensing; Soil organic matter fertility level.