A robust approach to deriving long-term daily surface NO2 levels across China: Correction to substantial estimation bias in back-extrapolation

Environ Int. 2021 Sep:154:106576. doi: 10.1016/j.envint.2021.106576. Epub 2021 Apr 23.

Abstract

Background: Long-term surface NO2 data are essential for retrospective policy evaluation and chronic human exposure assessment. In the absence of NO2 observations for Mainland China before 2013, training a model with 2013-2018 data to make predictions for 2005-2012 (back-extrapolation) could cause substantial estimation bias due to concept drift.

Objective: This study aims to correct the estimation bias in order to reconstruct the spatiotemporal distribution of daily surface NO2 levels across China during 2005-2018.

Methods: On the basis of ground- and satellite-based data, we proposed the robust back-extrapolation with a random forest (RBE-RF) to simulate the surface NO2 through intermediate modeling of the scaling factors. For comparison purposes, we also employed a random forest (Base-RF), as a representative of the commonly used approach, to directly model the surface NO2 levels.

Results: The validation against Taiwan's NO2 observations during 2005-2012 showed that RBE-RF adequately corrected the substantial underestimation by Base-RF. The RMSE decreased from 10.1 to 8.2 µg/m3, 7.1 to 4.3 µg/m3, and 6.1 to 2.9 µg/m3 in predicting daily, monthly, and annual levels, respectively. For North China with the most severe pollution, the population-weighted NO2 ([NO2]pw) during 2005-2012 was estimated as 40.2 and 50.9 µg/m3 by Base-RF and RBE-RF, respectively, i.e., 21.0% difference. While both models predicted that the national annual [NO2]pw increased during 2005-2011 and then decreased, the interannual trends were underestimated by >50.2% by Base-RF relative to RBE-RF. During 2005-2018, the nationwide population that lived in the areas with NO2 > 40 µg/m3 were estimated as 259 and 460 million by Base-RF and RBE-RF, respectively.

Conclusion: With RBE-RF, we corrected the estimation bias in back-extrapolation and obtained a full-coverage dataset of daily surface NO2 across China during 2005-2018, which is valuable for environmental management and epidemiological research.

Keywords: Back extrapolation; Concept drift; Exposure assessment; Long term; Machine learning; Nitrogen dioxide.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Air Pollutants* / analysis
  • Air Pollution* / analysis
  • China
  • Environmental Monitoring
  • Humans
  • Nitrogen Dioxide / analysis
  • Particulate Matter / analysis
  • Retrospective Studies

Substances

  • Air Pollutants
  • Particulate Matter
  • Nitrogen Dioxide