A land use regression model using machine learning and locally developed low cost particulate matter sensors in Uganda

Environ Res. 2021 Aug:199:111352. doi: 10.1016/j.envres.2021.111352. Epub 2021 May 24.

Abstract

The application of land use regression (LUR) modeling for estimating air pollution exposure has been used only rarely in sub-Saharan Africa (SSA). This is generally due to a lack of air quality monitoring networks in the region. Low cost air quality sensors developed locally in sub-Saharan Africa presents a sustainable operating mechanism that may help generate the air monitoring data needed for exposure estimation of air pollution with LUR models. The primary objective of our study is to investigate whether a network of locally developed low-cost air quality sensors can be used in LUR modeling for accurately predicting monthly ambient fine particulate matter (PM2.5) air pollution in urban areas of central and eastern Uganda. Secondarily, we aimed to explore whether the application of machine learning (ML) can improve LUR predictions compared to ordinary least squares (OLS) regression. We used data for the entire year of 2020 from a network of 23 PM2.5 low-cost sensors located in urban municipalities of eastern and central Uganda. Between January 1, 2020 and December 31, 2020, these sensors collected highly time-resolved measurement data of PM2.5 air concentrations. We used monthly-averaged PM2.5 concentration data for LUR prediction modeling of monthly PM2.5 concentrations. We used eight different ML base-learner algorithms as well as ensemble modeling. We applied 5-fold cross validation (80% training/20% test random splits) to evaluate the models with resampling and Root mean squared error (RMSE). The relative explanatory power and accuracy of the ML algorithms were evaluated by comparing coefficient of determination (R2) and RMSE, using OLS as the reference approach. The overall average PM2.5 concentration during the study period was 52.22 μg/m3 (IQR: 38.11, 62.84 μg/m3)-well above World Health Organization PM2.5 ambient air guidelines. From the base-learner and ensemble models, RMSE and R2 values ranged between 7.65 μg/m3 - 16.85 μg/m3 and 0.24-0.84, respectively. Extreme gradient boosting (xgbTree) performed best out of the base learner algorithms (R2 = 0.84; RMSE = 7.65 μg/m3). Model performance from ensemble modeling with Lasso and Elastic-Net Regularized Generalized Linear Models (glmnet) did not outperform xgbTree, but prediction performance was comparable to that of xgbTree. The most important temporal and spatial predictors of monthly PM2.5 levels were monthly precipitation, percent of the population using solid fuels for cooking, distance to Lake Victoria, and greenspace (NDVI) within a 500-m buffer of air monitors. In conclusion, data from locally developed low-cost PM sensors provide evidence that they can be used for spatio-temporal prediction modeling of air pollution exposures in Uganda. Moreover, the non-parametric ML and ensemble approaches to LUR modeling clearly outperformed OLS regression algorithm for the prediction of monthly PM2.5 concentrations. Deploying low-cost air quality sensors in concert with implementation of data quality control measures, can help address the critical need for expanding and improving air quality monitoring in resource-constrained settings of sub-Saharan Africa. These low-cost sensors, in conjunction with non-parametric ML algorithms, may provide a rapid path forward for PM2.5 exposure assessment and to spur air pollution epidemiology research in the region.

Keywords: Land use regression; Low-cost sensors; Machine learning; Particulate matter.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Air Pollutants* / analysis
  • Air Pollution* / analysis
  • Cities
  • Environmental Monitoring
  • Machine Learning
  • Particulate Matter / analysis
  • Uganda

Substances

  • Air Pollutants
  • Particulate Matter