A Hypothesis Test for Detecting Distance-Specific Clustering and Dispersion in Areal Data

Stella Self; Anna Overby; Anja Zgodic; David White; Alexander McLain; Caitlin Dyckman

doi:10.1016/j.spasta.2023.100757

A Hypothesis Test for Detecting Distance-Specific Clustering and Dispersion in Areal Data

Spat Stat. 2023 Jun:55:100757. doi: 10.1016/j.spasta.2023.100757. Epub 2023 May 19.

Authors

Stella Self¹, Anna Overby², Anja Zgodic¹, David White³, Alexander McLain^{1

4}, Caitlin Dyckman^{2

4}

Affiliations

¹ Arnold School of Public Health, University of South Carolina, 921 Assembly Street, Columbia, SC 29208, USA.
² College of Architecture, Arts and Humanities, Clemson University, Fernow Street, Clemson, SC 29634, USA.
³ College of Behavioral, Social and Health Sciences, Clemson University, Epsilon Zeta Dr, Clemson, SC 29634, USA.
⁴ Shared Last Author.

PMID: 37396190
PMCID: PMC10312012 (available on 2024-06-01)
DOI: 10.1016/j.spasta.2023.100757

Abstract

Spatial clustering detection has a variety of applications in diverse fields, including identifying infectious disease outbreaks, pinpointing crime hotspots, and identifying clusters of neurons in brain imaging applications. Ripley's K-function is a popular method for detecting clustering (or dispersion) in point process data at specific distances. Ripley's K-function measures the expected number of points within a given distance of any observed point. Clustering can be assessed by comparing the observed value of Ripley's K-function to the expected value under complete spatial randomness. While performing spatial clustering analysis on point process data is common, applications to areal data commonly arise and need to be accurately assessed. Inspired by Ripley's K-function, we develop the positive area proportion function (PAPF) and use it to develop a hypothesis testing procedure for the detection of spatial clustering and dispersion at specific distances in areal data. We compare the performance of the proposed PAPF hypothesis test to that of the global Moran's I statistic, the Getis-Ord general G statistic, and the spatial scan statistic with extensive simulation studies. We then evaluate the real-world performance of our method by using it to detect spatial clustering in land parcels containing conservation easements and US counties with high pediatric overweight/obesity rates.

Keywords: Ripley’s K-function; areal data; cluster detection; clustering.

Abstract

Grants and funding