Predicting intraurban PM2.5 concentrations using enhanced machine learning approaches and incorporating human activity patterns

Environ Res. 2021 May:196:110423. doi: 10.1016/j.envres.2020.110423. Epub 2020 Nov 4.

Abstract

Urban areas contribute substantially to human exposure to ambient air pollution. Numerous statistical prediction models have been used to estimate ambient concentrations of fine particulate matter (PM2.5) and other pollutants in urban environments, with some incorporating machine learning (ML) algorithms to improve predictive power. However, many ML approaches for predicting ambient pollutant concentrations to date have used principal component analysis (PCA) with traditional regression algorithms to explore linear correlations between variables and to reduce the dimensionality of the data. Moreover, while most urban air quality prediction models have traditionally incorporated explanatory variables such as meteorological, land use, transportation/mobility, and/or co-pollutant factors, recent research has shown that local emissions from building infrastructure may also be useful factors to consider in estimating urban pollutant concentrations. Here we propose an enhanced ML approach for predicting urban ambient PM2.5 concentrations that hybridizes cascade and PCA methods to reduce the dimensionality of the data-space and explore nonlinear effects between variables. We test the approach using different durations of time series air quality datasets of hourly PM2.5 concentrations from three air quality monitoring sites in different urban neighborhoods in Chicago, IL to explore the influence of dynamic human-related factors, including mobility (i.e., traffic) and building occupancy patterns, on model performance. We test 9 state-of-the-art ML algorithms to find the most effective algorithm for modeling intraurban PM2.5 variations and we explore the relative importance of all sets of factors on intraurban air quality model performance. Results demonstrate that Gaussian-kernel support vector regression (SVR) was the most effective ML algorithm tested, improving accuracy by 118% compared to a traditional multiple linear regression (MLR) approach. Incorporating the enhanced approach with SVR algorithm increased model performance up to 18.4% for yearlong and 98.7% for month-long hourly datasets, respectively. Incorporating assumptions for human occupancy patterns in dominant building typologies resulted in improvements in model performance by between 4% and 37%. Combined, these innovations can be used to improve the performance and accuracy of urban air quality prediction models compared to conventional approaches.

Keywords: Air pollution modeling; Artificial intelligence; Human activity; Outdoor air quality; Statistical prediction model.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Air Pollutants* / analysis
  • Air Pollution* / analysis
  • Environmental Monitoring
  • Human Activities
  • Humans
  • Machine Learning
  • Particulate Matter / analysis

Substances

  • Air Pollutants
  • Particulate Matter