Machine learning based estimation of field-scale daily, high resolution, multi-depth soil moisture for the Western and Midwestern United States

PeerJ. 2022 Nov 4:10:e14275. doi: 10.7717/peerj.14275. eCollection 2022.

Abstract

Background: High-resolution soil moisture estimates are critical for planning water management and assessing environmental quality. In-situ measurements alone are too costly to support the spatial and temporal resolutions needed for water management. Recent efforts have combined calibration data with machine learning algorithms to fill the gap where high resolution moisture estimates are lacking at the field scale. This study aimed to provide calibrated soil moisture models and methodology for generating gridded estimates of soil moisture at multiple depths, according to user-defined temporal periods, spatial resolution and extent.

Methods: We applied nearly one million national library soil moisture records from over 100 sites, spanning the U.S. Midwest and West, to build Quantile Random Forest (QRF) calibration models. The QRF models were built on covariates including soil moisture estimates from North American Land Data Assimilation System (NLDAS), soil properties, climate variables, digital elevation models, and remote sensing-derived indices. We also explored an alternative approach that adopted a regionalized calibration dataset for the Western U.S. The broad-scale QRF models were independently validated according to sampling depths, land cover type, and observation period. We then explored the model performance improved with local samples used for spiking. Finally, the QRF models were applied to estimate soil moisture at the field scale where evaluation was carried out to check estimated temporal and spatial patterns.

Results: The broad-scale QRF model showed moderate performance (R2 = 0.53, RMSE = 0.078 m3/m3) when data points from all depth layers (up to 100 cm) were considered for an independent validation. Elevation, NLDAS-derived moisture, soil properties, and sampling depth were ranked as the most important covariates. The best model performance was observed for forest and pasture sites (R2 > 0.5; RMSE < 0.09 m3/m3), followed by grassland and cropland (R2 > 0.4; RMSE < 0.11 m3/m3). Model performance decreased with sampling depths and was slightly lower during the winter months. Spiking the national QRF model with local samples improved model performance by reducing the RMSE to less than 0.05 m3/m3 for grassland sites. At the field scale, model estimates illustrated more accurate temporal trends for surface than subsurface soil layers. Model estimated spatial patterns need to be further improved and validated with management data.

Conclusions: The model accuracy for top 0-20 cm soil depth (R2 > 0.5, RMSE < 0.08 m3/m3) showed promise for adopting the methodology for soil moisture monitoring. The success of spiking the national model with local samples showed the need to collect multi-year high frequency (e.g., hourly) sensor-based field measurements to improve estimates of soil moisture for a longer time period. Future work should improve model performance for deeper depths with additional hydraulic properties and use of locally-selected calibration datasets.

Keywords: Digital soil mapping; Environmental covariates; Field scale; Grassland; North American Land Data Assimilation System (NLDAS); Remote sensing; Soil climate analysis network (SCAN); Soil moisture downscaling; Spiking; U.S. Climate Reference Network (USCRN).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Climate
  • Machine Learning
  • Midwestern United States
  • Remote Sensing Technology* / methods
  • Soil*
  • Water / analysis

Substances

  • Soil
  • Water

Grants and funding

This work was supported by the Rangeland Tracking and Management project funded by Conscience Bay Research, LLC and the Woodwell Fund for Climate Solutions and Conscience. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.