A land use regression model using machine learning and locally developed low cost particulate matter sensors in Uganda

Eric S Coker; A Kofi Amegah; Ernest Mwebaze; Joel Ssematimba; Engineer Bainomugisha

doi:10.1016/j.envres.2021.111352

A land use regression model using machine learning and locally developed low cost particulate matter sensors in Uganda

Environ Res. 2021 Aug:199:111352. doi: 10.1016/j.envres.2021.111352. Epub 2021 May 24.

Authors

Eric S Coker¹, A Kofi Amegah², Ernest Mwebaze³, Joel Ssematimba⁴, Engineer Bainomugisha⁵

Affiliations

¹ University of Florida, College of Public Health and Health Professions, Department of Environmental and Global Health, University of Florida, Gainesville, FL, USA. Electronic address: eric.coker@phhp.ufl.edu.
² Public Health Research Group, Department of Biomedical Sciences, University of Cape Coast, Cape Coast, Ghana.
³ Sunbird AI, P.O. Box 11296, Kampala, Uganda.
⁴ AirQo, Department of Computer Science, College of Computing and Information Sciences, Makerere University, Plot 56 Pool Road, Kampala, Uganda.
⁵ Sunbird AI, P.O. Box 11296, Kampala, Uganda; AirQo, Department of Computer Science, College of Computing and Information Sciences, Makerere University, Plot 56 Pool Road, Kampala, Uganda.

PMID: 34043968
DOI: 10.1016/j.envres.2021.111352

Abstract

The application of land use regression (LUR) modeling for estimating air pollution exposure has been used only rarely in sub-Saharan Africa (SSA). This is generally due to a lack of air quality monitoring networks in the region. Low cost air quality sensors developed locally in sub-Saharan Africa presents a sustainable operating mechanism that may help generate the air monitoring data needed for exposure estimation of air pollution with LUR models. The primary objective of our study is to investigate whether a network of locally developed low-cost air quality sensors can be used in LUR modeling for accurately predicting monthly ambient fine particulate matter (PM2.5) air pollution in urban areas of central and eastern Uganda. Secondarily, we aimed to explore whether the application of machine learning (ML) can improve LUR predictions compared to ordinary least squares (OLS) regression. We used data for the entire year of 2020 from a network of 23 PM2.5 low-cost sensors located in urban municipalities of eastern and central Uganda. Between January 1, 2020 and December 31, 2020, these sensors collected highly time-resolved measurement data of PM2.5 air concentrations. We used monthly-averaged PM2.5 concentration data for LUR prediction modeling of monthly PM2.5 concentrations. We used eight different ML base-learner algorithms as well as ensemble modeling. We applied 5-fold cross validation (80% training/20% test random splits) to evaluate the models with resampling and Root mean squared error (RMSE). The relative explanatory power and accuracy of the ML algorithms were evaluated by comparing coefficient of determination (R²) and RMSE, using OLS as the reference approach. The overall average PM2.5 concentration during the study period was 52.22 μg/m³ (IQR: 38.11, 62.84 μg/m³)-well above World Health Organization PM2.5 ambient air guidelines. From the base-learner and ensemble models, RMSE and R² values ranged between 7.65 μg/m³ - 16.85 μg/m³ and 0.24-0.84, respectively. Extreme gradient boosting (xgbTree) performed best out of the base learner algorithms (R² = 0.84; RMSE = 7.65 μg/m³). Model performance from ensemble modeling with Lasso and Elastic-Net Regularized Generalized Linear Models (glmnet) did not outperform xgbTree, but prediction performance was comparable to that of xgbTree. The most important temporal and spatial predictors of monthly PM2.5 levels were monthly precipitation, percent of the population using solid fuels for cooking, distance to Lake Victoria, and greenspace (NDVI) within a 500-m buffer of air monitors. In conclusion, data from locally developed low-cost PM sensors provide evidence that they can be used for spatio-temporal prediction modeling of air pollution exposures in Uganda. Moreover, the non-parametric ML and ensemble approaches to LUR modeling clearly outperformed OLS regression algorithm for the prediction of monthly PM2.5 concentrations. Deploying low-cost air quality sensors in concert with implementation of data quality control measures, can help address the critical need for expanding and improving air quality monitoring in resource-constrained settings of sub-Saharan Africa. These low-cost sensors, in conjunction with non-parametric ML algorithms, may provide a rapid path forward for PM2.5 exposure assessment and to spur air pollution epidemiology research in the region.

Keywords: Land use regression; Low-cost sensors; Machine learning; Particulate matter.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Air Pollutants* / analysis
Air Pollution* / analysis
Cities
Environmental Monitoring
Machine Learning
Particulate Matter / analysis
Uganda

Substances

Air Pollutants
Particulate Matter