Natural language processing of clinical notes enables early inborn error of immunity risk ascertainment

Kirk Roberts; Aaron T Chin; Klaus Loewy; Lisa Pompeii; Harold Shin; Nicholas L Rider

doi:10.1016/j.jacig.2024.100224

Natural language processing of clinical notes enables early inborn error of immunity risk ascertainment

J Allergy Clin Immunol Glob. 2024 Feb 2;3(2):100224. doi: 10.1016/j.jacig.2024.100224. eCollection 2024 May.

Authors

Kirk Roberts¹, Aaron T Chin², Klaus Loewy³, Lisa Pompeii⁴, Harold Shin⁵, Nicholas L Rider^{6

7}

Affiliations

¹ McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Tex.
² Division of Immunology, Allergy, and Rheumatology, University of California, Los Angeles, Calif.
³ Texas Children's Hospital, Houston, Tex.
⁴ Department of Patient Services, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio.
⁵ College of Osteopathic Medicine, Liberty University, Lynchburg, Va.
⁶ Division of Health System & Implementation Science, Virginia Tech Carilion School of Medicine, Roanoke, Va.
⁷ Section of Allergy and Immunology, Carilion Clinic, Roanoke, Va.

Abstract

Background: There are now approximately 450 discrete inborn errors of immunity (IEI) described; however, diagnostic rates remain suboptimal. Use of structured health record data has proven useful for patient detection but may be augmented by natural language processing (NLP). Here we present a machine learning model that can distinguish patients from controls significantly in advance of ultimate diagnosis date.

Objective: We sought to create an NLP machine learning algorithm that could identify IEI patients early during the disease course and shorten the diagnostic odyssey.

Methods: Our approach involved extracting a large corpus of IEI patient clinical-note text from a major referral center's electronic health record (EHR) system and a matched control corpus for comparison. We built text classifiers with simple machine learning methods and trained them on progressively longer time epochs before date of diagnosis.

Results: The top performing NLP algorithm effectively distinguished cases from controls robustly 36 months before ultimate clinical diagnosis (area under precision recall curve > 0.95). Corpus analysis demonstrated that statistically enriched, IEI-relevant terms were evident 24+ months before diagnosis, validating that clinical notes can provide a signal for early prediction of IEI.

Conclusion: Mining EHR notes with NLP holds promise for improving early IEI patient detection.

Keywords: Natural language processing; artificial intelligence; diagnosis; inborn errors of immunity; machine learning; primary immunodeficiency; text mining.