Use and Application of Large Language Models for Patient Questions Following Total Knee Arthroplasty

Sandeep S Bains; Jeremy A Dubin; Daniel Hameed; Oliver C Sax; Scott Douglas; Michael A Mont; James Nace; Ronald E Delanois

doi:10.1016/j.arth.2024.03.017

Use and Application of Large Language Models for Patient Questions Following Total Knee Arthroplasty

J Arthroplasty. 2024 Mar 13:S0883-5403(24)00233-X. doi: 10.1016/j.arth.2024.03.017. Online ahead of print.

Authors

Sandeep S Bains¹, Jeremy A Dubin¹, Daniel Hameed¹, Oliver C Sax¹, Scott Douglas¹, Michael A Mont¹, James Nace¹, Ronald E Delanois¹

Affiliation

¹ Rubin Institute for Advanced Orthopedics, LifeBridge Health, Sinai Hospital of Baltimore, Baltimore, Maryland.

PMID: 38490569
DOI: 10.1016/j.arth.2024.03.017

Abstract

Background: A consumer-focused health care model not only allows unprecedented access to information, but equally warrants consideration of the appropriateness of providing accurate patient health information. Nurses play a large role in influencing patient satisfaction following total knee arthroplasty (TKA), but they come at a cost. A specific natural language artificial intelligence (AI) model, ChatGPT (Chat Generative Pre-trained Transformer), has accumulated over 100 million users within months of launching. As such, we aimed to compare: (1) orthopaedic surgeons' evaluation of the appropriateness of the answers to the most frequently asked patient questions after TKA; and (2) patients' comfort level in answering their postoperative questions by using answers provided by arthroplasty-trained nurses and ChatGPT.

Methods: We prospectively created 60 questions based on the most commonly asked patient questions following TKA. There were 3 fellowship-trained surgeons who assessed the answers provided by arthroplasty-trained nurses and ChatGPT-4 to each of the questions. The surgeons graded each set of responses based on clinical judgment as: (1) "appropriate," (2) "inappropriate" if the response contained inappropriate information, or (3) "unreliable," if the responses provided inconsistent content. Patients' comfort level and trust in AI were assessed using Research Electronic Data Capture (REDCap) hosted at our local hospital.

Results: The surgeons graded 44 out of 60 (73.3%) responses for the arthroplasty-trained nurses and 44 out of 60 (73.3%) for ChatGPT to be "appropriate." There were 4 responses graded "inappropriate" and one response graded "unreliable" provided by the nurses. For the ChatGPT response, there were 5 responses graded "inappropriate" and no responses graded "unreliable." There were 136 patients (53.8%) who were more comfortable with the answers provided by ChatGPT compared to 86 patients (34.0%) who preferred the answers from arthroplasty-trained nurses. Of the 253 patients, 233 (92.1%) were uncertain if they would trust AI to answer their postoperative questions. There were 127 patients (50.2%) who answered that if they knew the previous answer was provided by ChatGPT, their comfort level in trusting the answer would change.

Conclusions: One potential use of ChatGPT can be found in providing appropriate answers to patient questions after TKA. At our institution, cost expenditures can potentially be minimized while maintaining patient satisfaction. Inevitably, successful implementation is dependent on the ability to provide information that is credible and in accordance with the objectives of both physicians and patients.

Level of evidence: III.

Keywords: AI; ChatGPT; arthroplasty; large language models; machine learning; patient satisfaction.