A spatiotemporal XGBoost model for PM2.5 concentration prediction and its application in Shanghai

Heliyon. 2023 Nov 23;9(12):e22569. doi: 10.1016/j.heliyon.2023.e22569. eCollection 2023 Dec.

Abstract

This paper innovatively constructed an analytical and forecasting framework to predict PM2.5 concentration levels for 16 municipal districts in Shanghai. By means of XGBoost parameters adjustment, empirical mode decomposition, and model fusion, improvements are made on XGBoost prediction accuracy and stability so that prediction deviation at extreme points can be avoided. The main findings of this paper can be summarized as follows: 1) Compared with the original model, the goodness of fit of the modified XGBoost model on the test set increased by 17 %, and the root mean square error decreased by 28 %; 2) The variation of PM2.5 concentration in Shanghai has a significant seasonal (cyclical) effect, and its fluctuation period is 3 months (a quarter). In winter, the frequency of extreme value points is significantly higher than that in other seasons; 3) In terms of spatial distribution, the PM2.5 concentration in the central city of Shanghai is higher than that in the rural areas, and the PM2.5 concentration gradually decreases from center city to the surrounding areas. The innovation and contribution of this paper can be summarized as follows: 1) EEMD algorithm verified by SSA was used to decompose the original model without reconstructing all subsequences and get the best weighing among each part of the hybrid model by using variable weight assignment; 2) The city was cut into pieces according to administrative districts in avoid of the duplicate analysis when utilizing advised Kriging interpolation; 3) IDW method was applied to verified Kriging interpolation to increase the accuracy; 4) The latitude and longitude were innovatively converted into the arc length of the corresponding spherical surface; 5) Hierarchical analysis method was used to obtain the order of importance among the PM2.5 monitoring stations, which could improve the accuracy and achieve dimension reduction.

Keywords: Extreme gradient boosting model (XGBoost), PM2.5, Signal decomposition, Kriging interpolation.