Estimating PM2.5 Concentrations in the Conterminous United States Using the Random Forest Approach

Environ Sci Technol. 2017 Jun 20;51(12):6936-6944. doi: 10.1021/acs.est.7b01210. Epub 2017 Jun 1.

Abstract

To estimate PM2.5 concentrations, many parametric regression models have been developed, while nonparametric machine learning algorithms are used less often and national-scale models are rare. In this paper, we develop a random forest model incorporating aerosol optical depth (AOD) data, meteorological fields, and land use variables to estimate daily 24 h averaged ground-level PM2.5 concentrations over the conterminous United States in 2011. Random forests are an ensemble learning method that provides predictions with high accuracy and interpretability. Our results achieve an overall cross-validation (CV) R2 value of 0.80. Mean prediction error (MPE) and root mean squared prediction error (RMSPE) for daily predictions are 1.78 and 2.83 μg/m3, respectively, indicating a good agreement between CV predictions and observations. The prediction accuracy of our model is similar to those reported in previous studies using neural networks or regression models on both national and regional scales. In addition, the incorporation of convolutional layers for land use terms and nearby PM2.5 measurements increase CV R2 by ∼0.02 and ∼0.06, respectively, indicating their significant contributions to prediction accuracy. A pair of different variable importance measures both indicate that the convolutional layer for nearby PM2.5 measurements and AOD values are among the most-important predictor variables for the training process.

MeSH terms

  • Aerosols*
  • Algorithms*
  • Particulate Matter*
  • United States

Substances

  • Aerosols
  • Particulate Matter