Correcting Measurement Error in Satellite Aerosol Optical Depth with Machine Learning for Modeling PM2.5 in the Northeastern USA

Allan C Just; Margherita M De Carli; Alexandra Shtein; Michael Dorman; Alexei Lyapustin; Itai Kloog

doi:10.3390/rs10050803

Correcting Measurement Error in Satellite Aerosol Optical Depth with Machine Learning for Modeling PM_2.5 in the Northeastern USA

Remote Sens (Basel). 2018 May;10(5):803. doi: 10.3390/rs10050803. Epub 2018 May 22.

Authors

Allan C Just¹, Margherita M De Carli¹, Alexandra Shtein², Michael Dorman², Alexei Lyapustin³, Itai Kloog²

Affiliations

¹ Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
² Department of Geography and Environmental Development, Ben-Gurion University of the Negev, Beer Sheva 8410501, Israel.
³ National Aeronautics and Space Administration (NASA) Goddard Space Flight Center (GSFC), Greenbelt, MD 20771, USA.

Abstract

Satellite-derived estimates of aerosol optical depth (AOD) are key predictors in particulate air pollution models. The multi-step retrieval algorithms that estimate AOD also produce quality control variables but these have not been systematically used to address the measurement error in AOD. We compare three machine-learning methods: random forests, gradient boosting, and extreme gradient boosting (XGBoost) to characterize and correct measurement error in the Multi-Angle Implementation of Atmospheric Correction (MAIAC) 1 × 1 km AOD product for Aqua and Terra satellites across the Northeastern/Mid-Atlantic USA versus collocated measures from 79 ground-based AERONET stations over 14 years. Models included 52 quality control, land use, meteorology, and spatially-derived features. Variable importance measures suggest relative azimuth, AOD uncertainty, and the AOD difference in 30-210 km moving windows are among the most important features for predicting measurement error. XGBoost outperformed the other machine-learning approaches, decreasing the root mean squared error in withheld testing data by 43% and 44% for Aqua and Terra. After correction using XGBoost, the correlation of collocated AOD and daily PM_2.5 monitors across the region increased by 10 and 9 percentage points for Aqua and Terra. We demonstrate how machine learning with quality control and spatial features substantially improves satellite-derived AOD products for air pollution modeling.

Keywords: AERONET; MAIAC; MODIS; PM2.5; aerosol optical depth; air pollution; gradient boosting; machine learning; measurement error.

Abstract

Grants and funding