Data-Driven Model Building for Life-Course Epidemiology

Am J Epidemiol. 2021 Sep 1;190(9):1898-1907. doi: 10.1093/aje/kwab087.

Abstract

Life-course epidemiology is useful for describing and analyzing complex etiological mechanisms for disease development, but existing statistical methods are essentially confirmatory, because they rely on a priori model specification. This limits the scope of causal inquiries that can be made, because these methods are suited mostly to examine well-known hypotheses that do not question our established view of health, which could lead to confirmation bias. We propose an exploratory alternative. Instead of specifying a life-course model prior to data analysis, our method infers the life-course model directly from the data. Our proposed method extends the well-known Peter-Clark (PC) algorithm (named after its authors) for causal discovery, and it facilitates including temporal information for inferring a model from observational data. The extended algorithm is called temporal PC. The obtained life-course model can afterward be perused for interesting causal hypotheses. Our method complements classical confirmatory methods and guides researchers in expanding their models in new directions. We showcase the method using a data set encompassing almost 3,000 Danish men followed from birth until age 65 years. Using this data set, we inferred life-course models for the role of socioeconomic and health-related factors on development of depression.

Keywords: causal discovery; life-course epidemiology; observational data; structure learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Algorithms
  • Causality
  • Child
  • Child, Preschool
  • Denmark / epidemiology
  • Depression / epidemiology
  • Depression / etiology
  • Epidemiologic Methods*
  • Humans
  • Infant
  • Infant, Newborn
  • Male
  • Middle Aged
  • Models, Statistical*
  • Socioeconomic Factors
  • Young Adult