Are Different Versions of ChatGPT's Ability Comparable to the Clinical Diagnosis Presented in Case Reports? A Descriptive Study

Jingfang Chen; Linlin Liu; Shujin Ruan; Mengjun Li; Chengliang Yin

doi:10.2147/JMDH.S441790

Are Different Versions of ChatGPT's Ability Comparable to the Clinical Diagnosis Presented in Case Reports? A Descriptive Study

J Multidiscip Healthc. 2023 Dec 6:16:3825-3831. doi: 10.2147/JMDH.S441790. eCollection 2023.

Authors

Jingfang Chen^{1

2

3}, Linlin Liu³, Shujin Ruan³, Mengjun Li³, Chengliang Yin¹

Affiliations

¹ Faculty of Medicine, Macau University of Science and Technology, Macau, People's Republic of China.
² Department of Research and Teaching, the Third People's Hospital of Shenzhen, Shenzhen, People's Republic of China.
³ Hengyang Medical School, School of Nursing, University of South China, Hengyang, People's Republic of China.

Abstract

Objective: ChatGPT, an advanced language model developed by OpenAI, holds the opportunity to bring about a transformation in the processing of clinical decision-making within the realm of medicine. Despite the growing popularity of research related on ChatGPT, there is a paucity of research assessing its appropriateness for clinical decision support. Our study delved into ChatGPT's ability to respond in accordance with the diagnoses found in case reports, with the intention of serving as a reference for clinical decision-making.

Methods: We included 147 case reports from the Chinese Medical Association Journal Database that generated primary and secondary diagnoses covering various diseases. Each question was independently posed three times to both GPT-3.5 and GPT-4.0, respectively. The results were analyzed regarding ChatGPT's mean scores and accuracy types.

Results: GPT-4.0 displayed moderate accuracy in primary diagnoses. With the increasing number of input, a corresponding enhancement in the accuracy of ChatGPT's outputs became evident. Notably, autoimmune diseases comprised the largest proportion of case reports, and the mean score for primary diagnosis exhibited statistically significant differences in autoimmune diseases.

Conclusion: Our finding suggested that the potential practicality in utilizing ChatGPT for clinical decision-making. To enhance the accuracy of ChatGPT, it is necessary to integrate it with the existing electronic health record system in the future.

Keywords: ChatGPT; artificial intelligence; case reports; clinical decision support systems.

Grants and funding

This work was supported by Shenzhen High-level Hospital Construction Fund (G2022006).