Appraising the performance of ChatGPT in psychiatry using 100 clinical case vignettes

Russell Franco D'Souza; Shabbir Amanullah; Mary Mathew; Krishna Mohan Surapaneni

doi:10.1016/j.ajp.2023.103770

Appraising the performance of ChatGPT in psychiatry using 100 clinical case vignettes

Asian J Psychiatr. 2023 Nov:89:103770. doi: 10.1016/j.ajp.2023.103770. Epub 2023 Sep 20.

Authors

Russell Franco D'Souza¹, Shabbir Amanullah², Mary Mathew³, Krishna Mohan Surapaneni⁴

Affiliations

¹ Professor of Organizational Psychological Medicine, International Institute of Organisational Psychological Medicine, 71 Cleeland Street, Dandenong Victoria, Melbourne, 3175 Australia.
² Division of Geriatric Psychiatry, Queen's University, 752 King Street West, Postal Bag 603 Kingston, ON K7L7X3.
³ Department of Pathology, Kasturba Medical College, Manipal Academy of Higher Education, Tiger Circle Road, Madhav Nagar, Manipal, Karnataka 576104.
⁴ Department of Biochemistry, Panimalar Medical College Hospital & Research Institute, Varadharajapuram, Poonamallee, Chennai - 600 123, Tamil Nadu, India; Departments of Medical Education, Molecular Virology, Research, Clinical Skills & Simulation, Panimalar Medical College Hospital & Research Institute, Varadharajapuram, Poonamallee, Chennai - 600 123, Tamil Nadu, India. Electronic address: krishnamohan.surapaneni@gmail.com.

PMID: 37812998
DOI: 10.1016/j.ajp.2023.103770

Abstract

Background: ChatGPT has emerged as the most advanced and rapidly developing large language chatbot system. With its immense potential ranging from answering a simple query to cracking highly competitive medical exams, ChatGPT continues to impress the scientists and researchers worldwide giving room for more discussions regarding its utility in various fields. One such field of attention is Psychiatry. With suboptimal diagnosis and treatment, assuring mental health and well-being is a challenge in many countries, particularly developing nations. To this regard, we conducted an evaluation to assess the performance of ChatGPT 3.5 in Psychiatry using clinical cases to provide evidence-based information regarding the implication of ChatGPT 3.5 in enhancing mental health and well-being.

Methods: ChatGPT 3.5 was used in this experimental study to initiate the conversations and collect responses to clinical vignettes in Psychiatry. Using 100 clinical case vignettes, the replies were assessed by expert faculties from the Department of Psychiatry. There were 100 different psychiatric illnesses represented in the cases. We recorded and assessed the initial ChatGPT 3.5 responses. The evaluation was conducted using the objective of questions that were put forth at the conclusion of the case, and the aim of the questions was divided into 10 categories. The grading was completed by taking the mean value of the scores provided by the evaluators. Graphs and tables were used to represent the grades.

Results: The evaluation report suggests that ChatGPT 3.5 fared extremely well in Psychiatry by receiving "Grade A" ratings in 61 out of 100 cases, "Grade B" ratings in 31, and "Grade C" ratings in 8. Majority of the queries were concerned with the management strategies, which were followed by diagnosis, differential diagnosis, assessment, investigation, counselling, clinical reasoning, ethical reasoning, prognosis, and request acceptance. ChatGPT 3.5 performed extremely well, especially in generating management strategies followed by diagnoses for different psychiatric conditions. There were no responses which were graded "D" indicating that there were no errors in the diagnosis or response for clinical care. Only a few discrepancies and additional details were missed in a few responses that received a "Grade C" CONCLUSION: It is evident from our study that ChatGPT 3.5 has appreciable knowledge and interpretation skills in Psychiatry. Thus, ChatGPT 3.5 undoubtedly has the potential to transform the field of Medicine and we emphasize its utility in Psychiatry through the finding of our study. However, for any AI model to be successful, assuring the reliability, validation of information, proper guidelines and implementation framework are necessary.

Keywords: “Artificial Intelligence”, “ChatGPT”; “Mental health”; “Mental well-being”; “Psychiatry”.

MeSH terms

Communication
Diagnosis, Differential
Humans
Mental Disorders* / diagnosis
Mental Disorders* / drug therapy
Psychiatry*
Reproducibility of Results