Ultrahigh Dimensional Variable Selection for Interpolation of Point Referenced Spatial Data: A Digital Soil Mapping Case Study

PLoS One. 2016 Sep 7;11(9):e0162489. doi: 10.1371/journal.pone.0162489. eCollection 2016.

Abstract

Modern soil mapping is characterised by the need to interpolate point referenced (geostatistical) observations and the availability of large numbers of environmental characteristics for consideration as covariates to aid this interpolation. Modelling tasks of this nature also occur in other fields such as biogeography and environmental science. This analysis employs the Least Angle Regression (LAR) algorithm for fitting Least Absolute Shrinkage and Selection Operator (LASSO) penalized Multiple Linear Regressions models. This analysis demonstrates the efficiency of the LAR algorithm at selecting covariates to aid the interpolation of geostatistical soil carbon observations. Where an exhaustive search of the models that could be constructed from 800 potential covariate terms and 60 observations would be prohibitively demanding, LASSO variable selection is accomplished with trivial computational investment.

MeSH terms

  • Algorithms*
  • Carbon / analysis
  • Organic Chemicals / analysis
  • Reproducibility of Results
  • Soil*
  • Statistics as Topic*

Substances

  • Organic Chemicals
  • Soil
  • Carbon

Grants and funding

This work was funded by the CRC for Spatial Information (CRCSI), established and supported under the Australian Government Cooperative Research Centres Programme. One of the authors (BRF) wishes to acknowledge the receipt of a Postgraduate Scholarship from the CRCSI (http://www.crcsi.com.au/).