A full-coverage estimation of PM2.5 concentrations using a hybrid XGBoost-WD model and WRF-simulated meteorological fields in the Yangtze River Delta Urban Agglomeration, China

Environ Res. 2022 Jan:203:111799. doi: 10.1016/j.envres.2021.111799. Epub 2021 Jul 31.

Abstract

In spite of the state-of-the-art performances of machine learning in the PM2.5 estimation, the high-value PM2.5 underestimation and non-random aerosol optical depth (AOD) missing are still huge obstacles. By incorporating wavelet decomposition (WD) into the extreme gradient boosting (XGBoost), a hybrid XGBoost-WD model was established to obtain the full-coverage PM2.5 estimation at 3-km spatial resolution in the Yangtze River Delta Urban Agglomeration (YRDUA). In this study, 3-km-resolution meteorological fields simulated by WRF along with AOD derived from Moderate Resolution Imaging Spectroradiometer (MODIS) were served as explanatory variables. Model MW and Model NW were developed using XGBoost-WD for the areas with and without AOD respectively to obtain a full-coverage PM2.5 mapping in the YRDUA. The XGBoost-WD model showed good performances in estimating PM2.5 with R2 of 0.80 in the Model MW and 0.87 in the Model NW. Moreover, the K-value of Model MW increased from 0.77 to 0.79 and that of Model NM increased from 0.81 to 0.86 compared with the model without the step of WD, indicating an improvement on the problem of PM2.5 underestimation. Due to a better ability of capturing abrupt changes in the PM2.5 concentrations, the spatial evolution of PM2.5 during a typical pollution event could be mapped more accurately. Finally, the analysis of variable importance showed that the three most important variables in the estimation of the low-frequency coefficients of PM2.5 (PM2.5_A4) were temperature at 2 m (T2), day of year (DOY) and longitude (LON), while that in the high-frequency coefficients of PM2.5 (PM2.5_D) were CO, AOD and NO2. This study not only provided an effective solution to the PM2.5 underestimation and AOD missing problems in the PM2.5 estimation, but also proposed a new method to further refine the sophisticated correlations between PM2.5 and some spatiotemporal variables.

Keywords: Full-coverage estimation; PM(2.5); PM(2.5) underestimation; Wavelet decomposition; XGBoost; YRDUA.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aerosols / analysis
  • Air Pollutants* / analysis
  • Air Pollution* / analysis
  • China
  • Environmental Monitoring
  • Particulate Matter / analysis
  • Rivers

Substances

  • Aerosols
  • Air Pollutants
  • Particulate Matter