Understanding the bias of mobile location data across spatial scales and over time: A comprehensive analysis of SafeGraph data in the United States

PLoS One. 2024 Jan 19;19(1):e0294430. doi: 10.1371/journal.pone.0294430. eCollection 2024.

Abstract

Mobile location data has emerged as a valuable data source for studying human mobility patterns in various contexts, including virus spreading, urban planning, and hazard evacuation. However, these data are often anonymized overviews derived from a panel of traced mobile devices, and the representativeness of these panels is not well documented. Without a clear understanding of the data representativeness, the interpretations of research based on mobile location data may be questionable. This article presents a comprehensive examination of the potential biases associated with mobile location data using SafeGraph Patterns data in the United States as a case study. The research rigorously scrutinizes and documents the bias from multiple dimensions, including spatial, temporal, urbanization, demographic, and socioeconomic, over a five-year period from 2018 to 2022 across diverse geographic levels, including state, county, census tract, and census block group. Our analysis of the SafeGraph Patterns dataset revealed an average sampling rate of 7.5% with notable temporal dynamics, geographic disparities, and urban-rural differences. The number of sampled devices was strongly correlated with the census population at the county level over the five years for both urban (r > 0.97) and rural counties (r > 0.91), but less so at the census tract and block group levels. We observed minor sampling biases among groups such as gender, age, and moderate-income, with biases typically ranging from -0.05 to +0.05. However, minority groups such as Hispanic populations, low-income households, and individuals with low levels of education generally exhibited higher levels of underrepresentation bias that varied over space, time, urbanization, and across geographic levels. These findings provide important insights for future studies that utilize SafeGraph data or other mobile location datasets, highlighting the need to thoroughly evaluate the spatiotemporal dynamics of the bias across spatial scales when employing such data sources.

MeSH terms

  • Bias
  • Humans
  • Income*
  • Population Dynamics
  • Rural Population
  • United States
  • Urbanization*

Grants and funding

the author(s) received no specific funding for this work.