Using land use variable information and a random forest approach to correct spatial mean bias in fused CMAQ fields for particulate and gas species

Atmos Environ (1994). 2022 Apr 1:274:118982. doi: 10.1016/j.atmosenv.2022.118982. Epub 2022 Feb 5.

Abstract

Accurate spatiotemporal air pollution fields are essential for health impact and epidemiologic studies. There are an increasing number of studies that have combined observational data with spatiotemporally complete air pollution simulations. Land-use, speciated gaseous and particulate pollutant concentrations and chemical transport modeling are fused using a random forest approach to construct daily air quality fields for 12 pollutants (CO, NOx, NO2, SO2, O3, PM2.5, PM10, and PM2.5 constituents: SO42-, NO3-, NH4+, EC and OC) between 2005 and 2014 for the continental United States with little spatial or temporal bias. R2 ranged from 0.45 to 0.96, depending upon pollutant. Additional analysis found that temporal R2 ranged from 0.84 to 0.99 and spatial R2 values ranged from 0.76 to 0.96 across species. Four-fold cross-validation was performed to assess the model's predictive power, and ranged from 0.40 for PM10 to 0.94 for SO4 with other pollutants falling within this range. Largest improvements were found for PM10 which had substantial bias in the CMAQ fields that varied east-to-west; smallest improvements were for SO4 which was already well simulated. The random forest model results to correct the simulation biases, while largely consistent year-to-year, did show slight variation due in part to changes in the distribution of monitors and changes in CMAQ simulation inputs.

Keywords: Air pollution; CMAQ; Gas species; Particulate species; Random forest model; Spatiotemporal pollutant fields.