Measurement error in time-series analysis: a simulation study comparing modelled and monitored data

BMC Med Res Methodol. 2013 Nov 13:13:136. doi: 10.1186/1471-2288-13-136.

Abstract

Background: Assessing health effects from background exposure to air pollution is often hampered by the sparseness of pollution monitoring networks. However, regional atmospheric chemistry-transport models (CTMs) can provide pollution data with national coverage at fine geographical and temporal resolution. We used statistical simulation to compare the impact on epidemiological time-series analysis of additive measurement error in sparse monitor data as opposed to geographically and temporally complete model data.

Methods: Statistical simulations were based on a theoretical area of 4 regions each consisting of twenty-five 5 km × 5 km grid-squares. In the context of a 3-year Poisson regression time-series analysis of the association between mortality and a single pollutant, we compared the error impact of using daily grid-specific model data as opposed to daily regional average monitor data. We investigated how this comparison was affected if we changed the number of grids per region containing a monitor. To inform simulations, estimates (e.g. of pollutant means) were obtained from observed monitor data for 2003-2006 for national network sites across the UK and corresponding model data that were generated by the EMEP-WRF CTM. Average within-site correlations between observed monitor and model data were 0.73 and 0.76 for rural and urban daily maximum 8-hour ozone respectively, and 0.67 and 0.61 for rural and urban loge(daily 1-hour maximum NO2).

Results: When regional averages were based on 5 or 10 monitors per region, health effect estimates exhibited little bias. However, with only 1 monitor per region, the regression coefficient in our time-series analysis was attenuated by an estimated 6% for urban background ozone, 13% for rural ozone, 29% for urban background loge(NO2) and 38% for rural loge(NO2). For grid-specific model data the corresponding figures were 19%, 22%, 54% and 44% respectively, i.e. similar for rural loge(NO2) but more marked for urban loge(NO2).

Conclusion: Even if correlations between model and monitor data appear reasonably strong, additive classical measurement error in model data may lead to appreciable bias in health effect estimates. As process-based air pollution models become more widely used in epidemiological time-series analysis, assessments of error impact that include statistical simulation may be useful.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Air Pollutants / chemistry
  • Air Pollution / statistics & numerical data
  • Algorithms
  • Bias
  • Computer Simulation*
  • Data Interpretation, Statistical
  • Humans
  • Linear Models
  • Models, Chemical*
  • Models, Statistical
  • Nitrogen Dioxide / chemistry
  • Ozone / chemistry
  • Poisson Distribution
  • Regression Analysis
  • Research Design
  • Time Factors
  • United Kingdom

Substances

  • Air Pollutants
  • Ozone
  • Nitrogen Dioxide