Comparison of Population-Weighted Exposure Estimates of Air Pollutants Based on Multiple Geostatistical Models in Beijing, China

Toxics. 2024 Mar 1;12(3):197. doi: 10.3390/toxics12030197.

Abstract

Various geostatistical models have been used in epidemiological research to evaluate ambient air pollutant exposures at a fine spatial scale. Few studies have investigated the performance of different exposure models on population-weighted exposure estimates and the resulting potential misclassification across various modeling approaches. This study developed spatial models for NO2 and PM2.5 and conducted exposure assessment in Beijing, China. It explored three spatial modeling approaches: variable dimension reduction, machine learning, and conventional linear regression. It compared their model performance by cross-validation (CV) and population-weighted exposure estimates. Specifically, partial least square (PLS) regression, random forests (RF), and supervised linear regression (SLR) models were developed based on an ordinary kriging (OK) framework for NO2 and PM2.5 in Beijing, China. The mean squared error-based R2 (R2mse) and root mean squared error (RMSE) in leave-one site-out cross-validation (LOOCV) were used to evaluate model performance. These models were used to predict the ambient exposure levels in the urban area and to estimate the misclassification of population-weighted exposure estimates in quartiles between them. The results showed that the PLS-OK models for NO2 and PM2.5, with the LOOCV R2mse of 0.82 and 0.81, respectively, outperformed the other models. The population-weighted exposure to NO2 estimated by the PLS-OK and RF-OK models exhibited the lowest misclassification in quartiles. For PM2.5, the estimates of potential misclassification were comparable across the three models. It indicated that the exposure misclassification made by choosing different modeling approaches should be carefully considered, and the resulting bias needs to be evaluated in epidemiological studies.

Keywords: NO2; PM2.5; exposure estimates; geostatistical modeling approach; misclassification.