Testing for clustering at many ranges inflates family-wise error rate (FWE)

Int J Health Geogr. 2015 Jan 15:14:4. doi: 10.1186/1476-072X-14-4.

Abstract

Background: Testing for clustering at multiple ranges within a single dataset is a common practice in spatial epidemiology. It is not documented whether this approach has an impact on the type 1 error rate.

Methods: We estimated the family-wise error rate (FWE) for the difference in Ripley's K functions test, when testing at an increasing number of ranges at an alpha-level of 0.05. Case and control locations were generated from a Cox process on a square area the size of the continental US (≈3,000,000 mi2). Two thousand Monte Carlo replicates were used to estimate the FWE with 95% confidence intervals when testing for clustering at one range, as well as 10, 50, and 100 equidistant ranges.

Results: The estimated FWE and 95% confidence intervals when testing 10, 50, and 100 ranges were 0.22 (0.20 - 0.24), 0.34 (0.31 - 0.36), and 0.36 (0.34 - 0.38), respectively.

Conclusions: Testing for clustering at multiple ranges within a single dataset inflated the FWE above the nominal level of 0.05. Investigators should construct simultaneous critical envelopes (available in spatstat package in R), or use a test statistic that integrates the test statistics from each range, as suggested by the creators of the difference in Ripley's K functions test.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Case-Control Studies
  • Cluster Analysis*
  • Cohort Studies
  • Humans
  • Models, Statistical*
  • Research Design* / statistics & numerical data