Spatio-temporal modeling of PM2.5 risk mapping using three machine learning algorithms

Environ Pollut. 2021 Nov 15:289:117859. doi: 10.1016/j.envpol.2021.117859. Epub 2021 Jul 28.

Abstract

Urban air pollution is one of the most critical issues that affect the environment, community health, economy, and management of urban areas. From a public health perspective, PM2.5 is one of the primary air pollutants, especially in Tehran's metropolis. Owing to the different patterns of PM2.5 in different seasons, Spatio-temporal modeling and identification of high-risk areas to reduce its effects seems necessary. The purpose of this study was Spatio-temporal modeling and preparation of PM2.5 risk mapping using three machine learning algorithms (random forest (RF), AdaBoost, and stochastic gradient descent (SGD)) in the metropolis of Tehran, Iran. Therefore, in the first step, to prepare the dependent variable data, the PM2.5 average was used for the four seasons of spring, summer, autumn, and winter. Then, using remote sensing (RS) and a geographic information system (GIS), independent data such as temperature, maximum temperature, minimum temperature, wind speed, rainfall, humidity, normalized difference vegetation index (NDVI), population density, street density, and distance to industrial centers were prepared as a seasonal average. To Spatio-temporal modeling using machine learning algorithms, 70% of the data were used for training and 30% for validation. The frequency ratio (FR) model was used as input to machine learning algorithms to calculate the spatial relationship between PM2.5 and the effective parameters. Finally, Spatio-temporal modeling and PM2.5 risk mapping were performed using three machine learning algorithms. The receiver operating characteristic (ROC) area under the curve (AUC) results showed that the RF algorithm had the greatest modeling accuracy, with values of 0.926, 0.94, 0.949, and 0.949 for spring, summer, autumn, and winter, respectively. According to the RF model, the most important variable in spring and autumn was NDVI. Temperature and distance to industrial centers were the most important variables in the summer and winter, respectively. The results showed that autumn, winter, summer, and spring had the highest risk of PM2.5, respectively.

Keywords: GIS; Machine learning algorithms; PM(2.5); Remote sensing; Spatio-temporal modeling.

MeSH terms

  • Air Pollutants* / analysis
  • Air Pollution* / analysis
  • Environmental Monitoring
  • Iran
  • Machine Learning
  • Particulate Matter / analysis
  • Seasons

Substances

  • Air Pollutants
  • Particulate Matter