Comparing automated vs. manual data collection for COVID-specific medications from electronic health records

Andrew L Yin; Winston L Guo; Evan T Sholle; Mangala Rajan; Mark N Alshak; Justin J Choi; Parag Goyal; Assem Jabri; Han A Li; Laura C Pinheiro; Graham T Wehmeyer; Mark Weiner; Weill Cornell COVID-19 Data Abstraction Consortium; Monika M Safford; Thomas R Campion; Curtis L Cole

doi:10.1016/j.ijmedinf.2021.104622

Comparing automated vs. manual data collection for COVID-specific medications from electronic health records

Int J Med Inform. 2022 Jan:157:104622. doi: 10.1016/j.ijmedinf.2021.104622. Epub 2021 Oct 21.

Authors

Andrew L Yin¹, Winston L Guo², Evan T Sholle³, Mangala Rajan⁴, Mark N Alshak⁵, Justin J Choi⁶, Parag Goyal⁶, Assem Jabri⁶, Han A Li⁵, Laura C Pinheiro⁴, Graham T Wehmeyer⁵, Mark Weiner⁷; Weill Cornell COVID-19 Data Abstraction Consortium⁸; Monika M Safford⁶, Thomas R Campion⁹, Curtis L Cole¹⁰

Affiliations

¹ Weill Cornell Medical College, Weill Cornell Medicine, New York, NY, United States; Department of Medicine, Weill Cornell Medicine, New York, NY, United States. Electronic address: aly27@cornell.edu.
² Weill Cornell Medical College, Weill Cornell Medicine, New York, NY, United States.
³ Information Technologies & Services Department, Weill Cornell Medicine, New York, NY, United States.
⁴ Department of Medicine, Weill Cornell Medicine, New York, NY, United States.
⁵ Weill Cornell Medical College, Weill Cornell Medicine, New York, NY, United States; Department of Medicine, Weill Cornell Medicine, New York, NY, United States.
⁶ Division of General Internal Medicine, Weill Cornell Medicine, New York, NY, United States.
⁷ Department of Medicine, Weill Cornell Medicine, New York, NY, United States; Information Technologies & Services Department, Weill Cornell Medicine, New York, NY, United States.
⁸ Weill Cornell Medical College, Weill Cornell Medicine, New York, NY, United States; Information Technologies & Services Department, Weill Cornell Medicine, New York, NY, United States.
⁹ Information Technologies & Services Department, Weill Cornell Medicine, New York, NY, United States; Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, United States; Clinical and Translational Science Center, Weill Cornell Medicine, New York, NY, United States.
¹⁰ Department of Medicine, Weill Cornell Medicine, New York, NY, United States; Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, United States.

Abstract

Introduction: Data extraction from electronic health record (EHR) systems occurs through manual abstraction, automated extraction, or a combination of both. While each method has its strengths and weaknesses, both are necessary for retrospective observational research as well as sudden clinical events, like the COVID-19 pandemic. Assessing the strengths, weaknesses, and potentials of these methods is important to continue to understand optimal approaches to extracting clinical data. We set out to assess automated and manual techniques for collecting medication use data in patients with COVID-19 to inform future observational studies that extract data from the electronic health record (EHR).

Materials and methods: For 4,123 COVID-positive patients hospitalized and/or seen in the emergency department at an academic medical center between 03/03/2020 and 05/15/2020, we compared medication use data of 25 medications or drug classes collected through manual abstraction and automated extraction from the EHR. Quantitatively, we assessed concordance using Cohen's kappa to measure interrater reliability, and qualitatively, we audited observed discrepancies to determine causes of inconsistencies.

Results: For the 16 inpatient medications, 11 (69%) demonstrated moderate or better agreement; 7 of those demonstrated strong or almost perfect agreement. For 9 outpatient medications, 3 (33%) demonstrated moderate agreement, but none achieved strong or almost perfect agreement. We audited 12% of all discrepancies (716/5,790) and, in those audited, observed three principal categories of error: human error in manual abstraction (26%), errors in the extract-transform-load (ETL) or mapping of the automated extraction (41%), and abstraction-query mismatch (33%).

Conclusion: Our findings suggest many inpatient medications can be collected reliably through automated extraction, especially when abstraction instructions are designed with data architecture in mind. We discuss quality issues, concerns, and improvements for institutions to consider when crafting an approach. During crises, institutions must decide how to allocate limited resources. We show that automated extraction of medications is feasible and make recommendations on how to improve future iterations.

Keywords: COVID-19; Chart review; Data quality; Electronic health record; Research data repositories.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

COVID-19*
Data Collection
Electronic Health Records
Humans
Pandemics
Pharmaceutical Preparations*
Reproducibility of Results
Retrospective Studies
SARS-CoV-2

Substances

Pharmaceutical Preparations

Grants and funding

UL1 TR002384/TR/NCATS NIH HHS/United States