Evaluating ChatGPT responses in the context of a 53-year-old male with a femoral neck fracture: a qualitative analysis

Eur J Orthop Surg Traumatol. 2024 Feb;34(2):927-955. doi: 10.1007/s00590-023-03742-4. Epub 2023 Sep 30.

Abstract

Purpose: The integration of artificial intelligence (AI) tools, such as ChatGPT, in clinical medicine and medical education has gained significant attention due to their potential to support decision-making and improve patient care. However, there is a need to evaluate the benefits and limitations of these tools in specific clinical scenarios.

Methods: This study used a case study approach within the field of orthopaedic surgery. A clinical case report featuring a 53-year-old male with a femoral neck fracture was used as the basis for evaluation. ChatGPT, a large language model, was asked to respond to clinical questions related to the case. The responses generated by ChatGPT were evaluated qualitatively, considering their relevance, justification, and alignment with the responses of real clinicians. Alternative dialogue protocols were also employed to assess the impact of additional prompts and contextual information on ChatGPT responses.

Results: ChatGPT generally provided clinically appropriate responses to the questions posed in the clinical case report. However, the level of justification and explanation varied across the generated responses. Occasionally, clinically inappropriate responses and inconsistencies were observed in the generated responses across different dialogue protocols and on separate days.

Conclusions: The findings of this study highlight both the potential and limitations of using ChatGPT in clinical practice. While ChatGPT demonstrated the ability to provide relevant clinical information, the lack of consistent justification and occasional clinically inappropriate responses raise concerns about its reliability. These results underscore the importance of careful consideration and validation when using AI tools in healthcare. Further research and clinician training are necessary to effectively integrate AI tools like ChatGPT, ensuring their safe and reliable use in clinical decision-making.

Keywords: Artificial intelligence; Chatgpt; Clinical; Decision-making; Large language model; Orthopaedic surgery.

Publication types

  • Case Reports

MeSH terms

  • Artificial Intelligence
  • Clinical Decision-Making
  • Femoral Neck Fractures* / surgery
  • Humans
  • Male
  • Middle Aged
  • Orthopedic Procedures*
  • Reproducibility of Results