On the interpretability of predictors in spatial data science: the information horizon

Sci Rep. 2020 Oct 7;10(1):16737. doi: 10.1038/s41598-020-73773-y.

Abstract

Two important theories in spatial modelling relate to structural and spatial dependence. Structural dependence refers to environmental state-factor models, where an environmental property is modelled as a function of the states and interactions of environmental predictors, such as climate, parent material or relief. Commonly, the functions are regression or supervised classification algorithms. Spatial dependence is present in most environmental properties and forms the basis for spatial interpolation and geostatistics. In machine learning, modelling with geographic coordinates or Euclidean distance fields, which resemble linear variograms with infinite ranges, can produce similar interpolations. Interpolations do not lend themselves to causal interpretations. Conversely, with structural modeling, one can, potentially, extract knowledge from the modelling. Two important characteristics of such interpretable environmental modelling are scale and information content. Scale is relevant because very coarse scale predictors can show nearly infinite ranges, falling out of what we call the information horizon, i.e. interpretation using domain knowledge isn't possible. Regarding information content, recent studies have shown that meaningless predictors, such as paintings or photographs of faces, can be used for spatial environmental modelling of ecological and soil properties, with accurate evaluation statistics. Here, we examine under which conditions modelling with such predictors can lead to accurate statistics and whether an information horizon can be derived for scale and information content.

Publication types

  • Research Support, Non-U.S. Gov't