ChatGPT-Exploring Its Role in Clinical Chemistry

Ridwan B Ibrahim; Anil K Chokkalla; Kaitlyn Levett; David Gustafson; Lily Olayinka; Sneha Kumar; Sridevi Devaraj

ChatGPT-Exploring Its Role in Clinical Chemistry

Ann Clin Lab Sci. 2023 Nov;53(6):835-839.

Authors

Ridwan B Ibrahim^{1

2}, Anil K Chokkalla^{1

2}, Kaitlyn Levett¹, David Gustafson¹, Lily Olayinka³, Sneha Kumar¹, Sridevi Devaraj^{4

2}

Affiliations

¹ Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX, USA.
² Department of Pathology, Texas Children's Hospital, Houston, TX, USA.
³ Alberta Precision Laboratories, Edmonton, Canada.
⁴ Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX, USA sxdevara@texaschildrens.org.

PMID: 38182139

Abstract

Objective: To evaluate the utility of artificial intelligence-powered language models (ChatGPT 3.5 and GPT-4) compared to trainees and clinical chemists in responding to common laboratory questions in the broad area of Clinical Chemistry.

Methods: 35 questions from real-life case scenarios, clinical consultations, and clinical chemistry testing questions were used to evaluate ChatGPT 3.5, and GPT-4 alongside clinical chemistry trainees (residents/fellows) and clinical chemistry faculty. The responses were scored based on category and based on years of experience.

Results: The Senior Chemistry Faculty demonstrated superior accuracy with 100% of correct responses compared to 90.5%, 82.9%, and 71.4% of correct responses from the junior chemistry faculty, fellows, and residents respectively. They all outperformed both ChatGPT 3.5 and GPT-4 which generated 60% and 71.4% correct responses respectively. Of the sub-categories examined, ChatGPT 3.5 achieved 100% accuracy in endocrinology while GPT-4 did not achieve 100% accuracy in any subcategory. GPT-4 was overall better than ChatGPT 3.5 by generating similar correct responses as residents (71.4%) but performed poorly to human participants when both partially correct and incorrect indices were considered.

Conclusion: Despite all the advances in AI-powered language models, ChatGPT 3.5 and GPT-4 cannot replace a trained pathologist in answering clinical chemistry questions. Caution should be observed by people, especially those not trained in clinical chemistry, to interpret test results using chatbots.

MeSH terms

Artificial Intelligence*
Chemistry, Clinical*
Humans
Laboratories
Pathologists