Human versus artificial intelligence-generated arthroplasty literature: A single-blinded analysis of perceived communication, quality, and authorship source

Kyle W Lawrence; Akram A Habibi; Spencer A Ward; Claudette M Lajam; Ran Schwarzkopf; Joshua C Rozell

doi:10.1002/rcs.2621

Human versus artificial intelligence-generated arthroplasty literature: A single-blinded analysis of perceived communication, quality, and authorship source

Int J Med Robot. 2024 Feb;20(1):e2621. doi: 10.1002/rcs.2621.

Authors

Kyle W Lawrence¹, Akram A Habibi¹, Spencer A Ward¹, Claudette M Lajam¹, Ran Schwarzkopf¹, Joshua C Rozell¹

Affiliation

¹ Department of Orthopedic Surgery, NYU Langone Health, New York, New York, USA.

PMID: 38348740
DOI: 10.1002/rcs.2621

Abstract

Background: Large language models (LLM) have unknown implications for medical research. This study assessed whether LLM-generated abstracts are distinguishable from human-written abstracts and to compare their perceived quality.

Methods: The LLM ChatGPT was used to generate 20 arthroplasty abstracts (AI-generated) based on full-text manuscripts, which were compared to originally published abstracts (human-written). Six blinded orthopaedic surgeons rated abstracts on overall quality, communication, and confidence in the authorship source. Authorship-confidence scores were compared to a test value representing complete inability to discern authorship.

Results: Modestly increased confidence in human authorship was observed for human-written abstracts compared with AI-generated abstracts (p = 0.028), though AI-generated abstract authorship-confidence scores were statistically consistent with inability to discern authorship (p = 0.999). Overall abstract quality was higher for human-written abstracts (p = 0.019).

Conclusions: AI-generated abstracts' absolute authorship-confidence ratings demonstrated difficulty in discerning authorship but did not achieve the perceived quality of human-written abstracts. Caution is warranted in implementing LLMs into scientific writing.

Keywords: ChatGPT; artificial intelligence; large language models; medical literature; total hip arthroplasty; total knee arthroplasty.

Publication types

Randomized Controlled Trial

MeSH terms

Arthroplasty
Artificial Intelligence*
Authorship*
Communication
Humans
Language