Quantifying the scale effect in geospatial big data using semi-variograms

PLoS One. 2019 Nov 14;14(11):e0225139. doi: 10.1371/journal.pone.0225139. eCollection 2019.

Abstract

The scale effect is an important research topic in the field of geography. When aggregating individual-level data into areal units, encountering the scale problem is inevitable. This problem is more substantial when mining collective patterns from big geo-data due to the characteristics of extensive spatial data. Although multi-scale models were constructed to mitigate this issue, most studies still arbitrarily choose a single scale to extract spatial patterns. In this research, we introduce the nugget-sill ratio (NSR) derived from semi-variograms as an indicator to extract the optimal scale. We conducted two simulated experiments to demonstrate the feasibility of this method. Our results showed that the optimal scale is negatively correlated with spatial point density, but positively correlated with the degree of dispersion in a point pattern. We also applied the proposed method to a case study using Weibo check-in data from Beijing, Shanghai, Chengdu, and Wuhan. Our study provides a new perspective to measure the spatial heterogeneity of big geo-data and selects an optimal spatial scale for big data analytics.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Big Data*
  • China
  • Data Mining*
  • Evaluation Studies as Topic
  • Geography
  • Humans
  • Models, Theoretical*
  • Spatial Analysis

Grants and funding

This study is funded by the National Science Foundation of China (#41971331 and #41625003). YG and YL received these two fundings respectively. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.