A causal data fusion method for the general exposure and outcome

Stat Med. 2022 Jan 30;41(2):328-339. doi: 10.1002/sim.9239. Epub 2021 Nov 2.

Abstract

With the advent of the big data era, the need to combine multiple individual data sets to draw causal effects arises naturally in many medical and biological applications. Especially each data set cannot measure enough confounders to infer the causal effect of an exposure on an outcome. In this article, we extend the method proposed by a previous study to causal data fusion of more than two data sets without external validation and to a more general (continuous or discrete) exposure and outcome. Theoretically, we obtain the condition for identifiability of exposure effects using multiple individual data sources for the continuous or discrete exposure and outcome. The simulation results show that our proposed causal data fusion method has unbiased causal effect estimate and higher precision than traditional regression, meta-analysis and statistical matching methods. We further apply our method to study the causal effect of BMI on glucose level in individuals with diabetes by combining two data sets. Our method is essential for causal data fusion and provides important insights into the ongoing discourse on the empirical analysis of merging multiple individual data sources.

Keywords: causal diagram; causal inference; data fusion; identification.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Causality
  • Computer Simulation
  • Humans
  • Meta-Analysis as Topic
  • Research Design*