Evaluation of Machine Learning Approaches to Predict Soil Organic Matter and pH Using vis-NIR Spectra

Sensors (Basel). 2019 Jan 11;19(2):263. doi: 10.3390/s19020263.

Abstract

Soil organic matter (SOM) and pH are essential soil fertility indictors of paddy soil in the middle-lower Yangtze Plain. Rapid, non-destructive and accurate determination of SOM and pH is vital to preventing soil degradation caused by inappropriate land management practices. Visible-near infrared (vis-NIR) spectroscopy with multivariate calibration can be used to effectively estimate soil properties. In this study, 523 soil samples were collected from paddy fields in the Yangtze Plain, China. Four machine learning approaches-partial least squares regression (PLSR), least squares-support vector machines (LS-SVM), extreme learning machines (ELM) and the Cubist regression model (Cubist)-were used to compare the prediction accuracy based on vis-NIR full bands and bands reduced using the genetic algorithm (GA). The coefficient of determination (R²), root mean square error (RMSE), and ratio of performance to inter-quartile distance (RPIQ) were used to assess the prediction accuracy. The ELM with GA reduced bands was the best model for SOM (SOM: R² = 0.81, RMSE = 5.17, RPIQ = 2.87) and pH (R² = 0.76, RMSE = 0.43, RPIQ = 2.15). The performance of the LS-SVM for pH prediction did not differ significantly between the model with GA (R² = 0.75, RMSE = 0.44, RPIQ = 2.08) and without GA (R² = 0.74, RMSE = 0.45, RPIQ = 2.07). Although a slight increase was observed when ELM were used for prediction of SOM and pH using reduced bands (SOM: R² = 0.81, RMSE = 5.17, RPIQ = 2.87; pH: R² = 0.76, RMSE = 0.43, RPIQ = 2.15) compared with full bands (R² = 0.81, RMSE = 5.18, RPIQ = 2.83; pH: R² = 0.76, RMSE = 0.45, RPIQ = 2.07), the number of wavelengths was greatly reduced (SOM: 201 to 44; pH: 201 to 32). Thus, the ELM coupled with reduced bands by GA is recommended for prediction of properties of paddy soil (SOM and pH) in the middle-lower Yangtze Plain.

Keywords: machine learning approaches; pH; paddy soil; soil organic matter; vis-NIR spectra.