Derivation of a natural language processing algorithm to identify febrile infants

J Hosp Med. 2022 Jan;17(1):11-18. doi: 10.1002/jhm.2732. Epub 2022 Jan 4.

Abstract

Background: Diagnostic codes can retrospectively identify samples of febrile infants, but sensitivity is low, resulting in many febrile infants eluding detection. To ensure study samples are representative, an improved approach is needed.

Objective: To derive and internally validate a natural language processing algorithm to identify febrile infants and compare its performance to diagnostic codes.

Methods: This cross-sectional study consisted of infants aged 0-90 days brought to one pediatric emergency department from January 2016 to December 2017. We aimed to identify infants with fever, defined as a documented temperature ≥38°C. We used 2017 clinical notes to develop two rule-based algorithms to identify infants with fever and tested them on data from 2016. Using manual abstraction as the gold standard, we compared performance of the two rule-based algorithms (Models 1, 2) to four previously published diagnostic code groups (Models 5-8) using area under the receiver-operating characteristics curve (AUC), sensitivity, and specificity.

Results: For the test set (n = 1190 infants), 184 infants were febrile (15.5%). The AUCs (0.92-0.95) and sensitivities (86%-92%) of Models 1 and 2 were significantly greater than Models 5-8 (0.67-0.74; 20%-74%) with similar specificities (93%-99%). In contrast to Models 5-8, samples from Models 1 and 2 demonstrated similar characteristics to the gold standard, including fever prevalence, median age, and rates of bacterial infections, hospitalizations, and severe outcomes.

Conclusions: Findings suggest rule-based algorithms can accurately identify febrile infants with greater sensitivity while preserving specificity compared to diagnostic codes. If externally validated, rule-based algorithms may be important tools to create representative study samples, thereby improving generalizability of findings.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Child
  • Cross-Sectional Studies
  • Fever* / diagnosis
  • Humans
  • Infant
  • Natural Language Processing*
  • Retrospective Studies