Extracting patient-level data from the electronic health record: Expanding opportunities for health system research

PLoS One. 2023 Mar 10;18(3):e0280342. doi: 10.1371/journal.pone.0280342. eCollection 2023.

Abstract

Background: Epidemiological studies of interstitial lung disease (ILD) are limited by small numbers and tertiary care bias. Investigators have leveraged the widespread use of electronic health records (EHRs) to overcome these limitations, but struggle to extract patient-level, longitudinal clinical data needed to address many important research questions. We hypothesized that we could automate longitudinal ILD cohort development using the EHR of a large, community-based healthcare system.

Study design and methods: We applied a previously validated algorithm to the EHR of a community-based healthcare system to identify ILD cases between 2012-2020. We then extracted disease-specific characteristics and outcomes using fully automated data-extraction algorithms and natural language processing of selected free-text.

Results: We identified a community cohort of 5,399 ILD patients (prevalence = 118 per 100,000). Pulmonary function tests (71%) and serologies (54%) were commonly used in the diagnostic evaluation, whereas lung biopsy was rare (5%). IPF was the most common ILD diagnosis (n = 972, 18%). Prednisone was the most commonly prescribed medication (911, 17%). Nintedanib and pirfenidone were rarely prescribed (n = 305, 5%). ILD patients were high-utilizers of inpatient (40%/year hospitalized) and outpatient care (80%/year with pulmonary visit), with sustained utilization throughout the post-diagnosis study period.

Discussion: We demonstrated the feasibility of robustly characterizing a variety of patient-level utilization and health services outcomes in a community-based EHR cohort. This represents a substantial methodological improvement by alleviating traditional constraints on the accuracy and clinical resolution of such ILD cohorts; we believe this approach will make community-based ILD research more efficient, effective, and scalable.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Electronic Health Records*
  • Humans
  • Lung
  • Lung Diseases, Interstitial* / diagnosis

Grants and funding

This study was funded by an Investigator Initiated Study grant from Genentech, HL12131 awarded to HRC and CI. https://www.gene.com/ The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The Pulmonary Fibrosis Foundation awarded a research grant to EF. https://www.pulmonaryfibrosis.org/researchers-healthcare-providers/research-opportunities/scholars. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.