Machine Learning-Aided Causal Inference Framework for Environmental Data Analysis: A COVID-19 Case Study

Environ Sci Technol. 2021 Oct 5;55(19):13400-13410. doi: 10.1021/acs.est.1c02204. Epub 2021 Sep 24.

Abstract

Links between environmental conditions (e.g., meteorological factors and air quality) and COVID-19 severity have been reported worldwide. However, the existing frameworks of data analysis are insufficient or inefficient to investigate the potential causality behind the associations involving multidimensional factors and complicated interrelationships. Thus, a causal inference framework equipped with the structural causal model aided by machine learning methods was proposed and applied to examine the potential causal relationships between COVID-19 severity and 10 environmental factors (NO2, O3, PM2.5, PM10, SO2, CO, average air temperature, atmospheric pressure, relative humidity, and wind speed) in 166 Chinese cities. The cities were grouped into three clusters based on the socio-economic features. Time-series data from these cities in each cluster were analyzed in different pandemic phases. The robustness check refuted most potential causal relationships' estimations (89 out of 90). Only one potential relationship about air temperature passed the final test with a causal effect of 0.041 under a specific cluster-phase condition. The results indicate that the environmental factors are unlikely to cause noticeable aggravation of the COVID-19 pandemic. This study also demonstrated the high value and potential of the proposed method in investigating causal problems with observational data in environmental or other fields.

Keywords: COVID-19; air pollutant; causal inference; machine learning; meteorological factor; structural causal model.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Air Pollution*
  • COVID-19*
  • Humans
  • Machine Learning
  • Pandemics
  • SARS-CoV-2