The utility of Zip4 codes in spatial epidemiological analysis

PLoS One. 2023 May 31;18(5):e0285552. doi: 10.1371/journal.pone.0285552. eCollection 2023.

Abstract

There are many public health situations within the United States that require fine geographical scale data to effectively inform response and intervention strategies. However, a condition for accessing and analyzing such data, especially when multiple institutions are involved, is being able to preserve a degree of spatial privacy and confidentiality. Hospitals and state health departments, who are generally the custodians of these fine-scale health data, are sometimes understandably hesitant to collaborate with each other due to these concerns. This paper looks at the utility and pitfalls of using Zip4 codes, a data layer often included as it is believed to be "safe", as a source for sharing fine-scale spatial health data that enables privacy preservation while maintaining a suitable precision for spatial analysis. While the Zip4 is widely supplied, researchers seldom utilize it. Nor is its spatial characteristics known by data guardians. To address this gap, we use the context of a near-real time spatial response to an emerging health threat to show how the Zip4 aggregation preserves an underlying spatial structure making it potentially suitable dataset for analysis. Our results suggest that based on the density of urbanization, Zip4 centroids are within 150 meters of the real location almost 99% of the time. Spatial analysis experiments performed on these Zip4 data suggest a far more insightful geographic output than if using more commonly used aggregation units such as street lines and census block groups. However, this improvement in analytical output comes at a spatial privy cost as Zip4 centroids have a higher potential of compromising spatial anonymity with 73% of addresses having a spatial k anonymity value less than 5 when compared to other aggregations. We conclude that while offers an exciting opportunity to share data between organizations, researchers and analysts need to be made aware of the potential for serious confidentiality violations.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Confidentiality*
  • Geography
  • Organizations
  • Privacy*
  • Spatial Analysis

Grants and funding

Research reported in this publication was supported by University Hospitals and by the Ohio Department of Higher Education Third Frontier Research Incentive under an award entitled “Geographic Monitoring for Early Disease Detection (GeoMEDD): An Actionable Warning System for Opiate Overdoses in Ohio.