Constructing Causal Life-Course Models: Comparative Study of Data-Driven and Theory-Driven Approaches

Am J Epidemiol. 2023 Nov 3;192(11):1917-1927. doi: 10.1093/aje/kwad144.

Abstract

Life-course epidemiology relies on specifying complex (causal) models that describe how variables interplay over time. Traditionally, such models have been constructed by perusing existing theory and previous studies. By comparing data-driven and theory-driven models, we investigated whether data-driven causal discovery algorithms can help in this process. We focused on a longitudinal data set on a cohort of Danish men (the Metropolit Study, 1953-2017). The theory-driven models were constructed by 2 subject-field experts. The data-driven models were constructed by use of the temporal Peter-Clark (TPC) algorithm. The TPC algorithm utilizes the temporal information embedded in life-course data. We found that the data-driven models recovered some, but not all, causal relationships included in the theory-driven expert models. The data-driven method was especially good at identifying direct causal relationships that the experts had high confidence in. Moreover, in a post hoc assessment, we found that most of the direct causal relationships proposed by the data-driven model but not included in the theory-driven model were plausible. Thus, the data-driven model may propose additional meaningful causal hypotheses that are new or have been overlooked by the experts. In conclusion, data-driven methods can aid causal model construction in life-course epidemiology, and combining both data-driven and theory-driven methods can lead to even stronger models.

Keywords: Peter-Clark algorithm; causal discovery; causal models; directed acyclic graphs; life-course studies; longitudinal studies; structure learning.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Causality
  • Humans
  • Male
  • Models, Theoretical*