GPT-4 Performance for Neurologic Localization

Jung-Hyun Lee; Eunhee Choi; Robert McDougal; William W Lytton

doi:10.1212/CPJ.0000000000200293

GPT-4 Performance for Neurologic Localization

Neurol Clin Pract. 2024 Jun;14(3):e200293. doi: 10.1212/CPJ.0000000000200293. Epub 2024 Mar 27.

Authors

Jung-Hyun Lee¹, Eunhee Choi¹, Robert McDougal¹, William W Lytton¹

Affiliation

¹ Department of Neurology (J-HL, WWL), State University of New York Downstate Health Sciences University; Department of Neurology (J-HL, WWL), Kings County Hospital; Department of Neurology (J-HL), Maimonides Medical Center, Brooklyn; Department of Internal Medicine (EC), Lincoln Medical Center, Bronx, NY; Department of Biostatistics (RM), Yale School of Public Health; Program in Computational Biology and Bioinformatics (RM); Wu-Tsai Institute (RM); Section of Biomedical Informatics and Data Science (RM), Yale School of Medicine, Yale University, New Haven, CT; and Department of Physiology and Pharmacology (WWL), State University of New York Downstate Health Sciences University, Brooklyn, NY.

Abstract

Background and objectives: In health care, large language models such as Generative Pretrained Transformers (GPTs), trained on extensive text datasets, have potential applications in reducing health care disparities across regions and populations. Previous software developed for lesion localization has been limited in scope. This study aims to evaluate the capability of GPT-4 for lesion localization based on clinical presentation.

Methods: GPT-4 was prompted using history and neurologic physical examination (H&P) from published cases of acute stroke followed by questions for clinical reasoning with answering for "single or multiple lesions," "side," and "brain region" using Zero-Shot Chain-of-Thought and Text Classification prompting. GPT-4 output on 3 separate trials for each of 46 cases was compared with imaging-based localization.

Results: GPT-4 successfully processed raw text from H&P to generate accurate neuroanatomical localization and detailed clinical reasoning. Performance metrics across trial-based analysis for specificity, sensitivity, precision, and F1-score were 0.87, 0.74, 0.75, and 0.74, respectively, for side; 0.94, 0.85, 0.84, and 0.85, respectively, for brain region. Class labels within the brain region were similarly high for all regions except the cerebellum and were also similar when considering all 3 trials to examine metrics by case. Errors were due to extrinsic causes-inadequate information in the published cases, and intrinsic causes-failures of logic or inadequate knowledge base.

Discussion: This study reveals capabilities of GPT-4 in the localization of acute stroke lesions, showing a potential future role as a clinical tool in neurology.