Estimating PM2.5 Concentrations in the Conterminous United States Using the Random Forest Approach

Xuefei Hu; Jessica H Belle; Xia Meng; Avani Wildani; Lance A Waller; Matthew J Strickland; Yang Liu

doi:10.1021/acs.est.7b01210

Estimating PM_2.5 Concentrations in the Conterminous United States Using the Random Forest Approach

Environ Sci Technol. 2017 Jun 20;51(12):6936-6944. doi: 10.1021/acs.est.7b01210. Epub 2017 Jun 1.

Authors

Xuefei Hu, Jessica H Belle, Xia Meng, Avani Wildani, Lance A Waller, Matthew J Strickland¹, Yang Liu

Affiliation

¹ School of Community Health Sciences, University of Nevada Reno , Reno, Nevada 89557, United States.

PMID: 28534414
DOI: 10.1021/acs.est.7b01210

Abstract

To estimate PM_2.5 concentrations, many parametric regression models have been developed, while nonparametric machine learning algorithms are used less often and national-scale models are rare. In this paper, we develop a random forest model incorporating aerosol optical depth (AOD) data, meteorological fields, and land use variables to estimate daily 24 h averaged ground-level PM_2.5 concentrations over the conterminous United States in 2011. Random forests are an ensemble learning method that provides predictions with high accuracy and interpretability. Our results achieve an overall cross-validation (CV) R² value of 0.80. Mean prediction error (MPE) and root mean squared prediction error (RMSPE) for daily predictions are 1.78 and 2.83 μg/m³, respectively, indicating a good agreement between CV predictions and observations. The prediction accuracy of our model is similar to those reported in previous studies using neural networks or regression models on both national and regional scales. In addition, the incorporation of convolutional layers for land use terms and nearby PM_2.5 measurements increase CV R² by ∼0.02 and ∼0.06, respectively, indicating their significant contributions to prediction accuracy. A pair of different variable importance measures both indicate that the convolutional layer for nearby PM_2.5 measurements and AOD values are among the most-important predictor variables for the training process.

MeSH terms

Aerosols*
Algorithms*
Particulate Matter*
United States

Substances

Aerosols
Particulate Matter