Assessing the Spatiotemporal Characteristics, Factor Importance, and Health Impacts of Air Pollution in Seoul by Integrating Machine Learning into Land-Use Regression Modeling at High Spatiotemporal Resolutions

Environ Sci Technol. 2023 Jan 11. doi: 10.1021/acs.est.2c03027. Online ahead of print.

Abstract

Previous studies have characterized spatial patterns of air pollution with land-use regression (LUR) models. However, the spatiotemporal characteristics of air pollution, the contribution of various factors to them, and the resultant health impacts have yet to be evaluated comprehensively. This study integrates machine learning (random forest) into LUR modeling (LURF) with intensive evaluations to develop high spatiotemporal resolution prediction models to estimate daily and diurnal PM2.5 and NO2 in Seoul, South Korea, at the spatial resolution of 500 m for a year (2019) and to then evaluate the contribution of driving factors and quantify the resultant premature mortality. Our results show that incorporating the random forest algorithm into our LUR model improves the model performance. Meteorological conditions have a great influence on daily models, while land-use factors play important roles in diurnal models. Our health assessment using dynamic population data estimates that PM2.5 and NO2 pollution, when combined, causes a total of 11,183 (95% CI: 5837-16,354) premature mortalities in Seoul in 2019, of which 64.9% are due to PM2.5, while the remaining are attributable to NO2. The air pollution-attributable health impacts in Seoul are largely caused by cardiovascular diseases including stroke. This study pinpoints the significant spatiotemporal variations and health impact of PM2.5 and NO2 in Seoul, providing essential data for epidemiological research and air quality management.

Keywords: air pollution; dynamic population; incidence; land-use regression model; machine learning; mortality.