Recent Methodological Trends in Epidemiology: No Need for Data-Driven Variable Selection?

Am J Epidemiol. 2024 Feb 5;193(2):370-376. doi: 10.1093/aje/kwad193.

Abstract

Variable selection in regression models is a particularly important issue in epidemiology, where one usually encounters observational studies. In contrast to randomized trials or experiments, confounding is often not controlled by the study design, but has to be accounted for by suitable statistical methods. For instance, when risk factors should be identified with unconfounded effect estimates, multivariable regression techniques can help to adjust for confounders. We investigated the current practice of variable selection in 4 major epidemiologic journals in 2019 and found that the majority of articles used subject-matter knowledge to determine a priori the set of included variables. In comparison with previous reviews from 2008 and 2015, fewer articles applied data-driven variable selection. Furthermore, for most articles the main aim of analysis was hypothesis-driven effect estimation in rather low-dimensional data situations (i.e., large sample size compared with the number of variables). Based on our results, we discuss the role of data-driven variable selection in epidemiology.

Keywords: confounding; epidemiologic methods; modeling; regression; variable selection.

MeSH terms

  • Humans
  • Regression Analysis
  • Research Design*
  • Sample Size