Estimation of urban AQI based on interpretable machine learning

Environ Sci Pollut Res Int. 2023 Sep;30(42):96562-96574. doi: 10.1007/s11356-023-29336-5. Epub 2023 Aug 14.

Abstract

Air pollution is an increasingly serious problem. Accurate and efficient prediction of air quality can effectively prevent air pollution and improve the quality of human life. The air quality index (AQI) is a dimensionless tool to describe air quality quantitatively. In this study, the machine learning (ML) method was used to estimate AQI for Shijiazhuang, China, as the research object, and pollutants and meteorological factors as data models. Specifically, eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Random Forest (RF) models were used. The experimental results show that XGBoost model captures the AQI variation trend well, and the R2 of XGBoost model is 0.929, which is 0.3% and 2.3% higher than the R2 of RF model and LightGBM model, respectively. In addition, through the SHAP-based model interpretation method, the study reveals the key factors of AQI variation, that is PM2.5 and PM10, play positive roles in the variation of AQI and AQI is less sensitive to meteorological factors. Finally, Beijing, Shanghai, Xi'an, and Guangzhou were selected to test the model's validity, and the model performance remained good. Our study shows that applying ML approach to air quality prediction is beneficial for efficiently assessing cities' future air quality.

Keywords: AQI; Machine learning; Prediction; SHAP.

MeSH terms

  • Air Pollutants* / analysis
  • Air Pollution* / analysis
  • Beijing
  • China
  • Cities
  • Environmental Monitoring / methods
  • Humans
  • Machine Learning
  • Particulate Matter / analysis

Substances

  • Air Pollutants
  • Particulate Matter