Predicting intraurban PM2.5 concentrations using enhanced machine learning approaches and incorporating human activity patterns

Mehdi Ashayeri; Narjes Abbasabadi; Mohammad Heidarinejad; Brent Stephens

doi:10.1016/j.envres.2020.110423

Predicting intraurban PM_2.5 concentrations using enhanced machine learning approaches and incorporating human activity patterns

Environ Res. 2021 May:196:110423. doi: 10.1016/j.envres.2020.110423. Epub 2020 Nov 4.

Authors

Mehdi Ashayeri¹, Narjes Abbasabadi¹, Mohammad Heidarinejad², Brent Stephens³

Affiliations

¹ College of Architecture, Illinois Institute of Technology, Chicago, IL, USA.
² Department of Civil, Architectural, and Environmental Engineering, Illinois Institute of Technology, Chicago, IL, USA.
³ Department of Civil, Architectural, and Environmental Engineering, Illinois Institute of Technology, Chicago, IL, USA. Electronic address: brent@iit.edu.

PMID: 33157105
DOI: 10.1016/j.envres.2020.110423

Abstract

Urban areas contribute substantially to human exposure to ambient air pollution. Numerous statistical prediction models have been used to estimate ambient concentrations of fine particulate matter (PM_2.5) and other pollutants in urban environments, with some incorporating machine learning (ML) algorithms to improve predictive power. However, many ML approaches for predicting ambient pollutant concentrations to date have used principal component analysis (PCA) with traditional regression algorithms to explore linear correlations between variables and to reduce the dimensionality of the data. Moreover, while most urban air quality prediction models have traditionally incorporated explanatory variables such as meteorological, land use, transportation/mobility, and/or co-pollutant factors, recent research has shown that local emissions from building infrastructure may also be useful factors to consider in estimating urban pollutant concentrations. Here we propose an enhanced ML approach for predicting urban ambient PM_2.5 concentrations that hybridizes cascade and PCA methods to reduce the dimensionality of the data-space and explore nonlinear effects between variables. We test the approach using different durations of time series air quality datasets of hourly PM_2.5 concentrations from three air quality monitoring sites in different urban neighborhoods in Chicago, IL to explore the influence of dynamic human-related factors, including mobility (i.e., traffic) and building occupancy patterns, on model performance. We test 9 state-of-the-art ML algorithms to find the most effective algorithm for modeling intraurban PM_2.5 variations and we explore the relative importance of all sets of factors on intraurban air quality model performance. Results demonstrate that Gaussian-kernel support vector regression (SVR) was the most effective ML algorithm tested, improving accuracy by 118% compared to a traditional multiple linear regression (MLR) approach. Incorporating the enhanced approach with SVR algorithm increased model performance up to 18.4% for yearlong and 98.7% for month-long hourly datasets, respectively. Incorporating assumptions for human occupancy patterns in dominant building typologies resulted in improvements in model performance by between 4% and 37%. Combined, these innovations can be used to improve the performance and accuracy of urban air quality prediction models compared to conventional approaches.

Keywords: Air pollution modeling; Artificial intelligence; Human activity; Outdoor air quality; Statistical prediction model.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Air Pollutants* / analysis
Air Pollution* / analysis
Environmental Monitoring
Human Activities
Humans
Machine Learning
Particulate Matter / analysis

Substances

Air Pollutants
Particulate Matter