Europe-wide air pollution modeling from 2000 to 2019 using geographically weighted regression

Environ Int. 2022 Oct:168:107485. doi: 10.1016/j.envint.2022.107485. Epub 2022 Aug 24.

Abstract

Previous European land-use regression (LUR) models assumed fixed linear relationships between air pollution concentrations and predictors such as traffic and land use. We evaluated whether including spatially-varying relationships could improve European LUR models by using geographically weighted regression (GWR) and random forest (RF). We built separate LUR models for each year from 2000 to 2019 for NO2, O3, PM2.5 and PM10 using annual average monitoring observations across Europe. Potential predictors included satellite retrievals, chemical transport model estimates and land-use variables. Supervised linear regression (SLR) was used to select predictors, and then GWR estimated the potentially spatially-varying coefficients. We developed multi-year models using geographically and temporally weighted regression (GTWR). Five-fold cross-validation per year showed that GWR and GTWR explained similar spatial variations in annual average concentrations (average R2 = NO2: 0.66; O3: 0.58; PM10: 0.62; PM2.5: 0.77), which are better than SLR (average R2 = NO2: 0.61; O3: 0.46; PM10: 0.51; PM2.5: 0.75) and RF (average R2 = NO2: 0.64; O3: 0.53; PM10: 0.56; PM2.5: 0.67). The GTWR predictions and a previously-used method of back-extrapolating 2010 model predictions using CTM were overall highly correlated (R2 > 0.8) for all pollutants. Including spatially-varying relationships using GWR modestly improved European air pollution annual LUR models, allowing time-varying exposure-health risk models.

Keywords: Geographically and temporally weighted regression; Land-use regression; Random forest; Spatially varying coefficient; Spatiotemporal variation.