Thyroid Ultrasound Appropriateness Identification Through Natural Language Processing of Electronic Health Records

Cristian Soto Jacome; Danny Segura Torres; Jungwei W Fan; Ricardo Loor-Torres; Mayra Duran; Misk Al Zahidy; Esteban Cabezas; Mariana Borras-Osorio; David Toro-Tobon; Yuqi Wu; Yonghui Wu; Naykky Singh Ospina; Juan P Brito

doi:10.1016/j.mcpdig.2024.01.001

Thyroid Ultrasound Appropriateness Identification Through Natural Language Processing of Electronic Health Records

Mayo Clin Proc Digit Health. 2024 Mar;2(1):67-74. doi: 10.1016/j.mcpdig.2024.01.001. Epub 2024 Feb 1.

Affiliations

¹ Division of Endocrinology, Diabetes, Metabolism, and Nutrition, Department of Medicine, Knowledge and Evaluation Research Unit, Mayo Clinic, Rochester, MN.
² Department of Artificial, Intelligence and Informatics, Mayo Clinic, Rochester, MN.
³ Division of Endocrinology, Diabetes, Metabolism, and Nutrition, Mayo Clinic, Rochester, MN.
⁴ Department of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, FL.
⁵ Division of Endocrinology, Department of Medicine, University of Florida, Gainesville, FL.

Abstract

Objective: To address thyroid cancer overdiagnosis, we aim to develop a natural language processing (NLP) algorithm to determine the appropriateness of thyroid ultrasounds (TUS).

Patients and methods: Between 2017 and 2021, we identified 18,000 TUS patients at Mayo Clinic and selected 628 for chart review to create a ground truth dataset based on consensus. We developed a rule-based NLP pipeline to identify TUS as appropriate TUS (aTUS) or inappropriate TUS (iTUS) using patients' clinical notes and additional meta information. In addition, we designed an abbreviated NLP pipeline (aNLP) solely focusing on labels from TUS order requisitions to facilitate deployment at other health care systems. Our dataset was split into a training set of 468 (75%) and a test set of 160 (25%), using the former for rule development and the latter for performance evaluation.

Results: There were 449 (95.9%) patients identified as aTUS and 19 (4.06%) as iTUS in the training set; there are 155 (96.88%) patients identified as aTUS and 5 (3.12%) were iTUS in the test set. In the training set, the pipeline achieved a sensitivity of 0.99, specificity of 0.95, and positive predictive value of 1.0 for detecting aTUS. The testing cohort revealed a sensitivity of 0.96, specificity of 0.80, and positive predictive value of 0.99. Similar performance metrics were observed in the aNLP pipeline.

Conclusion: The NLP models can accurately identify the appropriateness of a thyroid ultrasound from clinical documentation and order requisition information, a critical initial step toward evaluating the drivers and outcomes of TUS use and subsequent thyroid cancer overdiagnosis.

Thyroid Ultrasound Appropriateness Identification Through Natural Language Processing of Electronic Health Records

Authors

Affiliations

Abstract

Grants and funding