Measurement error in time-series analysis: a simulation study comparing modelled and monitored data

Barbara K Butland; Ben Armstrong; Richard W Atkinson; Paul Wilkinson; Mathew R Heal; Ruth M Doherty; Massimo Vieno

doi:10.1186/1471-2288-13-136

Measurement error in time-series analysis: a simulation study comparing modelled and monitored data

BMC Med Res Methodol. 2013 Nov 13:13:136. doi: 10.1186/1471-2288-13-136.

Authors

Barbara K Butland, Ben Armstrong¹, Richard W Atkinson, Paul Wilkinson, Mathew R Heal, Ruth M Doherty, Massimo Vieno

Affiliation

¹ Department of Social and Environmental Health Research, London School of Hygiene and Tropical Medicine, 15-17 Tavistock Place, London WC1H 9SH, UK. ben.armstrong@lshtm.ac.uk.

Abstract

Background: Assessing health effects from background exposure to air pollution is often hampered by the sparseness of pollution monitoring networks. However, regional atmospheric chemistry-transport models (CTMs) can provide pollution data with national coverage at fine geographical and temporal resolution. We used statistical simulation to compare the impact on epidemiological time-series analysis of additive measurement error in sparse monitor data as opposed to geographically and temporally complete model data.

Methods: Statistical simulations were based on a theoretical area of 4 regions each consisting of twenty-five 5 km × 5 km grid-squares. In the context of a 3-year Poisson regression time-series analysis of the association between mortality and a single pollutant, we compared the error impact of using daily grid-specific model data as opposed to daily regional average monitor data. We investigated how this comparison was affected if we changed the number of grids per region containing a monitor. To inform simulations, estimates (e.g. of pollutant means) were obtained from observed monitor data for 2003-2006 for national network sites across the UK and corresponding model data that were generated by the EMEP-WRF CTM. Average within-site correlations between observed monitor and model data were 0.73 and 0.76 for rural and urban daily maximum 8-hour ozone respectively, and 0.67 and 0.61 for rural and urban loge(daily 1-hour maximum NO2).

Results: When regional averages were based on 5 or 10 monitors per region, health effect estimates exhibited little bias. However, with only 1 monitor per region, the regression coefficient in our time-series analysis was attenuated by an estimated 6% for urban background ozone, 13% for rural ozone, 29% for urban background loge(NO2) and 38% for rural loge(NO2). For grid-specific model data the corresponding figures were 19%, 22%, 54% and 44% respectively, i.e. similar for rural loge(NO2) but more marked for urban loge(NO2).

Conclusion: Even if correlations between model and monitor data appear reasonably strong, additive classical measurement error in model data may lead to appreciable bias in health effect estimates. As process-based air pollution models become more widely used in epidemiological time-series analysis, assessments of error impact that include statistical simulation may be useful.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Air Pollutants / chemistry
Air Pollution / statistics & numerical data
Algorithms
Bias
Computer Simulation*
Data Interpretation, Statistical
Humans
Linear Models
Models, Chemical*
Models, Statistical
Nitrogen Dioxide / chemistry
Ozone / chemistry
Poisson Distribution
Regression Analysis
Research Design
Time Factors
United Kingdom

Substances

Air Pollutants
Ozone
Nitrogen Dioxide