Testing for clustering at many ranges inflates family-wise error rate (FWE)

Matthew Shane Loop; Leslie A McClure

doi:10.1186/1476-072X-14-4

Testing for clustering at many ranges inflates family-wise error rate (FWE)

Int J Health Geogr. 2015 Jan 15:14:4. doi: 10.1186/1476-072X-14-4.

Authors

Matthew Shane Loop¹, Leslie A McClure

Affiliation

¹ Department of Biostatistics, University of Alabama at Birmingham, 1665 University Boulevard, RPHB 327, 35294 Birmingham, Alabama, USA. loop2@uab.edu.

Abstract

Background: Testing for clustering at multiple ranges within a single dataset is a common practice in spatial epidemiology. It is not documented whether this approach has an impact on the type 1 error rate.

Methods: We estimated the family-wise error rate (FWE) for the difference in Ripley's K functions test, when testing at an increasing number of ranges at an alpha-level of 0.05. Case and control locations were generated from a Cox process on a square area the size of the continental US (≈3,000,000 mi2). Two thousand Monte Carlo replicates were used to estimate the FWE with 95% confidence intervals when testing for clustering at one range, as well as 10, 50, and 100 equidistant ranges.

Results: The estimated FWE and 95% confidence intervals when testing 10, 50, and 100 ranges were 0.22 (0.20 - 0.24), 0.34 (0.31 - 0.36), and 0.36 (0.34 - 0.38), respectively.

Conclusions: Testing for clustering at multiple ranges within a single dataset inflated the FWE above the nominal level of 0.05. Investigators should construct simultaneous critical envelopes (available in spatstat package in R), or use a test statistic that integrates the test statistics from each range, as suggested by the creators of the difference in Ripley's K functions test.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Case-Control Studies
Cluster Analysis*
Cohort Studies
Humans
Models, Statistical*
Research Design* / statistics & numerical data

Grants and funding

T32 HL079888/HL/NHLBI NIH HHS/United States