Comparison of Population-Weighted Exposure Estimates of Air Pollutants Based on Multiple Geostatistical Models in Beijing, China

Yinghan Wu; Jia Xu; Ziqi Liu; Bin Han; Wen Yang; Zhipeng Bai

doi:10.3390/toxics12030197

Comparison of Population-Weighted Exposure Estimates of Air Pollutants Based on Multiple Geostatistical Models in Beijing, China

Toxics. 2024 Mar 1;12(3):197. doi: 10.3390/toxics12030197.

Authors

Yinghan Wu¹, Jia Xu¹, Ziqi Liu^{1

2}, Bin Han¹, Wen Yang¹, Zhipeng Bai^{1

2}

Affiliations

¹ State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China.
² Department of Environmental & Occupational Health Sciences, School of Public Health, University of Washington, Seattle, WA 98105, USA.

Abstract

Various geostatistical models have been used in epidemiological research to evaluate ambient air pollutant exposures at a fine spatial scale. Few studies have investigated the performance of different exposure models on population-weighted exposure estimates and the resulting potential misclassification across various modeling approaches. This study developed spatial models for NO₂ and PM_2.5 and conducted exposure assessment in Beijing, China. It explored three spatial modeling approaches: variable dimension reduction, machine learning, and conventional linear regression. It compared their model performance by cross-validation (CV) and population-weighted exposure estimates. Specifically, partial least square (PLS) regression, random forests (RF), and supervised linear regression (SLR) models were developed based on an ordinary kriging (OK) framework for NO₂ and PM_2.5 in Beijing, China. The mean squared error-based R² (R²_mse) and root mean squared error (RMSE) in leave-one site-out cross-validation (LOOCV) were used to evaluate model performance. These models were used to predict the ambient exposure levels in the urban area and to estimate the misclassification of population-weighted exposure estimates in quartiles between them. The results showed that the PLS-OK models for NO₂ and PM_2.5, with the LOOCV R²_mse of 0.82 and 0.81, respectively, outperformed the other models. The population-weighted exposure to NO₂ estimated by the PLS-OK and RF-OK models exhibited the lowest misclassification in quartiles. For PM_2.5, the estimates of potential misclassification were comparable across the three models. It indicated that the exposure misclassification made by choosing different modeling approaches should be carefully considered, and the resulting bias needs to be evaluated in epidemiological studies.

Keywords: NO2; PM2.5; exposure estimates; geostatistical modeling approach; misclassification.

Grants and funding

2019YFE0115100/National Key Research and Development Program funded by the Ministry of Science and Technology of the People's Republic of China