Explore spatio-temporal PM2.5 features in northern Taiwan using machine learning techniques

Sci Total Environ. 2020 Sep 20:736:139656. doi: 10.1016/j.scitotenv.2020.139656. Epub 2020 May 23.

Abstract

The complex mixtures of local emission sources and regional transportations of air pollutants make accurate PM2.5 prediction a very challenging yet crucial task, especially under high pollution conditions. A symbolic representation of spatio-temporal PM2.5 features is the key to effective air pollution regulatory plans that notify the public to take necessary precautions against air pollution. The self-organizing map (SOM) can cluster high-dimensional datasets to form a meaningful topological map. This study implements the SOM to effectively extract and clearly distinguish the spatio-temporal features of long-term regional PM2.5 concentrations in a visible two-dimensional topological map. The spatial distribution of the configured topological map spans the long-term datasets of 25 monitoring stations in northern Taiwan using the Kriging method, and the temporal behavior of PM2.5 concentrations at various time scales (i.e., yearly, seasonal, and hourly) are explored in detail. Finally, we establish a machine learning model to predict PM2.5 concentrations for high pollution events. The analytical results indicate that: (1) high population density and heavy traffic load correspond to high PM2.5 concentrations; (2) the change of seasons brings obvious effects on PM2.5 concentration variation; and (3) the key input variables of the prediction model identified by the Gamma Test can improve model's reliability and accuracy for multi-step-ahead PM2.5 prediction. The results demonstrated that machine learning techniques can skillfully summarize and visibly present the clusted spatio-temporal PM2.5 features as well as improve air quality prediction accuracy.

Keywords: Back propagation neural network (BPNN); Gamma Test; Multi-step-ahead prediction; PM2.5; Self-organizing map (SOM); Spatio-temporal variation.