Soil Moisture, Organic Carbon, and Nitrogen Content Prediction with Hyperspectral Data Using Regression Models

Sensors (Basel). 2022 Oct 20;22(20):7998. doi: 10.3390/s22207998.

Abstract

Soil moisture, soil organic carbon, and nitrogen content prediction are considered significant fields of study as they are directly related to plant health and food production. Direct estimation of these soil properties with traditional methods, for example, the oven-drying technique and chemical analysis, is a time and resource-consuming approach and can predict only smaller areas. With the significant development of remote sensing and hyperspectral (HS) imaging technologies, soil moisture, carbon, and nitrogen can be estimated over vast areas. This paper presents a generalized approach to predicting three different essential soil contents using a comprehensive study of various machine learning (ML) models by considering the dimensional reduction in feature spaces. In this study, we have used three popular benchmark HS datasets captured in Germany and Sweden. The efficacy of different ML algorithms is evaluated to predict soil content, and significant improvement is obtained when a specific range of bands is selected. The performance of ML models is further improved by applying principal component analysis (PCA), a dimensional reduction method that works with an unsupervised learning method. The effect of soil temperature on soil moisture prediction is evaluated in this study, and the results show that when the soil temperature is considered with the HS band, the soil moisture prediction accuracy does not improve. However, the combined effect of band selection and feature transformation using PCA significantly enhances the prediction accuracy for soil moisture, carbon, and nitrogen content. This study represents a comprehensive analysis of a wide range of established ML regression models using data preprocessing, effective band selection, and data dimension reduction and attempt to understand which feature combinations provide the best accuracy. The outcomes of several ML models are verified with validation techniques and the best- and worst-case scenarios in terms of soil content are noted. The proposed approach outperforms existing estimation techniques.

Keywords: LUCAS data; band selection; k-fold cross validation; machine learning; principal component analysis.

MeSH terms

  • Algorithms
  • Carbon
  • Machine Learning
  • Nitrogen* / analysis
  • Soil* / chemistry

Substances

  • Soil
  • Nitrogen
  • Carbon

Grants and funding

This research was funded by Soil CRC Australia (No. 2.S.006 PhD Scholarship) and the APC was funded by Charles Sturt University and Soil CRC Australia.