Constructing Epidemiologic Cohorts from Electronic Health Record Data

Int J Environ Res Public Health. 2021 Dec 14;18(24):13193. doi: 10.3390/ijerph182413193.

Abstract

In the United States, electronic health records (EHR) are increasingly being incorporated into healthcare organizations to document patient health and services rendered. EHRs serve as a vast repository of demographic, diagnostic, procedural, therapeutic, and laboratory test data generated during the routine provision of health care. The appeal of using EHR data for epidemiologic research is clear: EHRs generate large datasets on real-world patient populations in an easily retrievable form permitting the cost-efficient execution of epidemiologic studies on a wide array of topics. Constructing epidemiologic cohorts from EHR data involves as a defining feature the development of data machinery, which transforms raw EHR data into an epidemiologic dataset from which appropriate inference can be drawn. Though data machinery includes many features, the current report focuses on three aspects of machinery development of high salience to EHR-based epidemiology: (1) selecting study participants; (2) defining "baseline" and assembly of baseline characteristics; and (3) follow-up for future outcomes. For each, the defining features and unique challenges with respect to EHR-based epidemiology are discussed. An ongoing example illustrates key points. EHR-based epidemiology will become more prominent as EHR data sources continue to proliferate. Epidemiologists must continue to improve the methods of EHR-based epidemiology given the relevance of EHRs in today's healthcare ecosystem.

Keywords: cohort studies; electronic health records; epidemiology; retrospective studies.

MeSH terms

  • Delivery of Health Care
  • Ecosystem*
  • Electronic Health Records*
  • Humans
  • Information Storage and Retrieval
  • United States