Development of a natural language processing algorithm to detect chronic cough in electronic health records

Vishal Bali; Jessica Weaver; Vladimir Turzhitsky; Jonathan Schelfhout; Misti L Paudel; Erin Hulbert; Jesse Peterson-Brandt; Anne-Marie Guerra Currie; Dylan Bakka

doi:10.1186/s12890-022-02035-6

Development of a natural language processing algorithm to detect chronic cough in electronic health records

BMC Pulm Med. 2022 Jun 28;22(1):256. doi: 10.1186/s12890-022-02035-6.

Authors

Vishal Bali¹, Jessica Weaver², Vladimir Turzhitsky², Jonathan Schelfhout², Misti L Paudel^{3

4}, Erin Hulbert³, Jesse Peterson-Brandt³, Anne-Marie Guerra Currie⁵, Dylan Bakka⁵

Affiliations

¹ Center for Observational and Real-World Evidence (CORE), Merck & Co., Inc., Rahway, NJ, USA. vishal.bali@merck.com.
² Center for Observational and Real-World Evidence (CORE), Merck & Co., Inc., Rahway, NJ, USA.
³ Health Economics and Outcomes Research (HEOR), Optum Insight, Eden Prairie, MN, USA.
⁴ Henry M. Jackson Foundation for the Advancement of Military Medicine, Bethesda, MD, USA.
⁵ Optum Enterprise Analytics (OEA), Optum Insight, Eden Prairie, MN, USA.

Abstract

Background: Chronic cough (CC) is difficult to identify in electronic health records (EHRs) due to the lack of specific diagnostic codes. We developed a natural language processing (NLP) model to identify cough in free-text provider notes in EHRs from multiple health care providers with the objective of using the model in a rules-based CC algorithm to identify individuals with CC from EHRs and to describe the demographic and clinical characteristics of individuals with CC.

Methods: This was a retrospective observational study of enrollees in Optum's Integrated Clinical + Claims Database. Participants were 18-85 years of age with medical and pharmacy health insurance coverage between January 2016 and March 2017. A labeled reference standard data set was constructed by manually annotating 1000 randomly selected provider notes from the EHRs of enrollees with ≥ 1 cough mention. An NLP model was developed to extract positive or negated cough contexts. NLP, cough diagnosis and medications identified cough encounters. Patients with ≥ 3 encounters spanning at least 56 days within 120 days were defined as having CC.

Results: The positive predictive value and sensitivity of the NLP algorithm were 0.96 and 0.68, respectively, for positive cough contexts, and 0.96 and 0.84, respectively, for negated cough contexts. Among the 4818 individuals identified as having CC, 37% were identified using NLP-identified cough mentions in provider notes alone, 16% by diagnosis codes and/or written medication orders, and 47% through a combination of provider notes and diagnosis codes/medications. Chronic cough patients were, on average, 61.0 years and 67.0% were female. The most prevalent comorbidities were respiratory infections (75%) and other lower respiratory disease (82%).

Conclusions: Our EHR-based algorithm integrating NLP methodology with structured fields was able to identify a CC population. Machine learning based approaches can therefore aid in patient selection for future CC research studies.

Keywords: Chronic cough; Cough; Diagnostic test accuracy study; Electronic health records; Natural language processing; Sensitivity and specificity.

Publication types

Observational Study

MeSH terms

Algorithms
Cough / diagnosis
Databases, Factual
Electronic Health Records*
Female
Humans
Male
Natural Language Processing*