Machine learning driven by environmental covariates to estimate high-resolution PM2.5 in data-poor regions

PeerJ. 2022 Mar 30:10:e13203. doi: 10.7717/peerj.13203. eCollection 2022.

Abstract

PM2.5, which refers to fine particles with an equivalent aerodynamic diameter of less than or equal to 2.5 µm, can not only affect air quality but also endanger public health. Nevertheless, the spatial distribution of PM2.5 is not well understood in data-poor regions where monitoring stations are scarce. Therefore, we constructed a random forest (RF) model and a bagging algorithm model based on ground-monitored PM2.5 data, aerosol optical depth (AOD) and meteorological data, and auxiliary geographical variables to accurately estimate the spatial distribution of PM2.5 concentrations in Xinjiang during 2015-2020 at a resolution of 1 km. Through 10-fold cross-validation (CV), the RF model and bagging algorithm model were verified and compared. The results showed the following: (1) The RF model achieved better model performance and thus can be used to estimate the PM2.5 concentration at a relatively high resolution. (2) The PM2.5 concentrations were high in southern Xinjiang and low in northern Xinjiang. The high values were concentrated mainly in the Tarim Basin, while most areas of northern Xinjiang maintained low PM2.5 levels year-round. (3) The PM2.5 values in Xinjiang showed significant seasonality, with the seasonally averaged concentrations decreasing as follows: winter (71.95 µg m-3) > spring (64.76 µg m-3) > autumn (46.01 µg m-3) > summer (43.40 µg m-3). Our model provides a way to monitor air quality in data-scarce places, thereby advancing efforts to achieve sustainable development in the future.

Keywords: High-resolution; PM2.5; Random forest; Xinjiang.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Air Pollutants* / analysis
  • Air Pollution* / analysis
  • Environmental Monitoring / methods
  • Machine Learning
  • Particulate Matter / analysis

Substances

  • Air Pollutants
  • Particulate Matter

Grants and funding

This work was supported by the National Natural Science Foundation of China (No. 41961059 and No.41771470). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.