Foot and Ankle Patient Education Materials and Artificial Intelligence Chatbots: A Comparative Analysis

Aarav S Parekh; Joseph A S McCahon; Amy Nghe; David I Pedowitz; Joseph N Daniel; Selene G Parekh

doi:10.1177/19386400241235834

Foot and Ankle Patient Education Materials and Artificial Intelligence Chatbots: A Comparative Analysis

Foot Ankle Spec. 2024 Mar 19:19386400241235834. doi: 10.1177/19386400241235834. Online ahead of print.

Authors

Aarav S Parekh¹, Joseph A S McCahon², Amy Nghe¹, David I Pedowitz¹, Joseph N Daniel¹, Selene G Parekh¹

Affiliations

¹ Rothman Orthopaedic Institute, Philadelphia, Pennsylvania.
² Jefferson Health NJ, Stratford, New Jersey.

PMID: 38504411
DOI: 10.1177/19386400241235834

Abstract

Background: The purpose of this study was to perform a comparative analysis of foot and ankle patient education material generated by the AI chatbots, as they compare to the American Orthopaedic Foot and Ankle Society (AOFAS)-recommended patient education website, FootCareMD.org.

Methods: ChatGPT, Google Bard, and Bing AI were used to generate patient educational materials on 10 of the most common foot and ankle conditions. The content from these AI language model platforms was analyzed and compared with that in FootCareMD.org for accuracy of included information. Accuracy was determined for each of the 10 conditions on a basis of included information regarding background, symptoms, causes, diagnosis, treatments, surgical options, recovery procedures, and risks or preventions.

Results: When compared to the reference standard of the AOFAS website FootCareMD.org, the AI language model platforms consistently scored below 60% in accuracy rates in all categories of the articles analyzed. ChatGPT was found to contain an average of 46.2% of key content across all included conditions when compared to FootCareMD.org. Comparatively, Google Bard and Bing AI contained 36.5% and 28.0% of information included on FootCareMD.org, respectively (P < .005).

Conclusion: Patient education regarding common foot and ankle conditions generated by AI language models provides limited content accuracy across all 3 AI chatbot platforms.

Level of evidence: Level IV.

Keywords: Bing; ChatGPT; Google Bard; artificial intelligence; large language models; patient education.