Estimating ground-level PM2.5 using subset regression model and machine learning algorithms in Asian megacity, Dhaka, Bangladesh

Air Qual Atmos Health. 2023;16(6):1117-1139. doi: 10.1007/s11869-023-01329-w. Epub 2023 Feb 25.

Abstract

Fine particulate matter (PM2.5) has become a prominent pollutant due to rapid economic development, urbanization, industrialization, and transport activities, which has serious adverse effects on human health and the environment. Many studies have employed traditional statistical models and remote-sensing technologies to estimate PM2.5 concentrations. However, statistical models have shown inconsistency in PM2.5 concentration predictions, while machine learning algorithms have excellent predictive capacity, but little research has been done on the complementary advantages of diverse approaches. The present study proposed the best subset regression model and machine learning approaches, including random tree, additive regression, reduced error pruning tree, and random subspace, to estimate the ground-level PM2.5 concentrations over Dhaka. This study used advanced machine learning algorithms to measure the effects of meteorological factors and air pollutants (NOX, SO2, CO, and O3) on the dynamics of PM2.5 in Dhaka from 2012 to 2020. Results showed that the best subset regression model was well-performed for forecasting PM2.5 concentrations for all sites based on the integration of precipitation, relative humidity, temperature, wind speed, SO2, NOX, and O3. Precipitation, relative humidity, and temperature have negative correlations with PM2.5. The concentration levels of pollutants are much higher at the beginning and end of the year. Random subspace is the optimal model for estimating PM2.5 because it has the least statistical error metrics compared to other models. This study suggests ensemble learning models to estimate PM2.5 concentrations. This study will help quantify ground-level PM2.5 concentration exposure and recommend regional government actions to prevent and regulate PM2.5 air pollution.

Supplementary information: The online version contains supplementary material available at 10.1007/s11869-023-01329-w.

Keywords: Air quality management; Machine learning; Meteorological factors; PM2.5 dynamics; Subset regression model.