Comparison between Random Forests, Artificial Neural Networks and Gradient Boosted Machines Methods of On-Line Vis-NIR Spectroscopy Measurements of Soil Total Nitrogen and Total Carbon

Sensors (Basel). 2017 Oct 24;17(10):2428. doi: 10.3390/s17102428.

Abstract

Accurate and detailed spatial soil information about within-field variability is essential for variable-rate applications of farm resources. Soil total nitrogen (TN) and total carbon (TC) are important fertility parameters that can be measured with on-line (mobile) visible and near infrared (vis-NIR) spectroscopy. This study compares the performance of local farm scale calibrations with those based on the spiking of selected local samples from both fields into an European dataset for TN and TC estimation using three modelling techniques, namely gradient boosted machines (GBM), artificial neural networks (ANNs) and random forests (RF). The on-line measurements were carried out using a mobile, fiber type, vis-NIR spectrophotometer (305-2200 nm) (AgroSpec from tec5, Germany), during which soil spectra were recorded in diffuse reflectance mode from two fields in the UK. After spectra pre-processing, the entire datasets were then divided into calibration (75%) and prediction (25%) sets, and calibration models for TN and TC were developed using GBM, ANN and RF with leave-one-out cross-validation. Results of cross-validation showed that the effect of spiking of local samples collected from a field into an European dataset when combined with RF has resulted in the highest coefficients of determination (R²) values of 0.97 and 0.98, the lowest root mean square error (RMSE) of 0.01% and 0.10%, and the highest residual prediction deviations (RPD) of 5.58 and 7.54, for TN and TC, respectively. Results for laboratory and on-line predictions generally followed the same trend as for cross-validation in one field, where the spiked European dataset-based RF calibration models outperformed the corresponding GBM and ANN models. In the second field ANN has replaced RF in being the best performing. However, the local field calibrations provided lower R² and RPD in most cases. Therefore, from a cost-effective point of view, it is recommended to adopt the spiked European dataset-based RF/ANN calibration models for successful prediction of TN and TC under on-line measurement conditions.

Keywords: artificial neural networks; gradient boosted machines; on-line vis-NIR measurement; random forests; spiking; total carbon; total nitrogen.