Modern Machiavelli? The illusion of ChatGPT-generated patient reviews in plastic and aesthetic surgery based on 9000 review classifications

J Plast Reconstr Aesthet Surg. 2024 Jan:88:99-108. doi: 10.1016/j.bjps.2023.10.119. Epub 2023 Oct 29.

Abstract

Background: Online patient reviews are crucial in guiding individuals who seek plastic surgery, but artificial chatbots pose a threat of disseminating fake reviews. This study aimed to compare real patient feedback with ChatGPT-generated reviews for the top five US plastic surgery procedures.

Methods: Thirty real patient reviews on rhinoplasty, blepharoplasty, facelift, liposuction, and breast augmentation were collected from RealSelf and used as templates for ChatGPT to generate matching patient reviews. Prolific users (n = 30) assessed 150 pairs of reviews to identify human-written and artificial intelligence (AI)-generated reviews. Patient reviews were further assessed using AI content detector software (Copyleaks AI).

Results: Among the 9000 classification tasks, 64.3% and 35.7% of reviews were classified as authentic and fake, respectively. On an average, the author (human versus machine) was correctly identified in 59.6% of cases, and this poor classification performance was consistent across all procedures. Patients with prior aesthetic treatment showed poorer classification performance than those without (p < 0.05). The mean character count in human-written reviews was significantly higher (p < 0.001) that that in AI-generated reviews, with a significant correlation between character count and participants' accuracy rate (p < 0.001). Emotional timbre of reviews differed significantly with "happiness" being more prevalent in human-written reviews (p < 0.001), and "disappointment" being more prevalent in AI reviews (p = 0.005). Copyleaks AI correctly classified 96.7% and 69.3% of human-written and ChatGPT-generated reviews, respectively.

Conclusion: ChatGPT convincingly replicates authentic patient reviews, even deceiving commercial AI detection software. Analyzing emotional tone and review length can help differentiate real from fake reviews, underscoring the need to educate both patients and physicians to prevent misinformation and mistrust.

Keywords: Artificial intelligence; ChatGPT; GPT-4; Large language model; Patient review; Plastic Surgery.

MeSH terms

  • Artificial Intelligence
  • Esthetics
  • Humans
  • Illusions*
  • Plastic Surgery Procedures*
  • Surgery, Plastic*