ChatGPT as an information tool in rhinology. Can we trust each other today?

Juan Riestra-Ayora; Cristina Vaduva; Jonathan Esteban-Sánchez; María Garrote-Garrote; Carlos Fernández-Navarro; Carolina Sánchez-Rodríguez; Eduardo Martin-Sanz

doi:10.1007/s00405-024-08581-5

ChatGPT as an information tool in rhinology. Can we trust each other today?

Eur Arch Otorhinolaryngol. 2024 Mar 4. doi: 10.1007/s00405-024-08581-5. Online ahead of print.

Authors

Juan Riestra-Ayora^{1

2}, Cristina Vaduva^{3

4}, Jonathan Esteban-Sánchez^{3

4}, María Garrote-Garrote⁴, Carlos Fernández-Navarro⁴, Carolina Sánchez-Rodríguez³, Eduardo Martin-Sanz^{3

4}

Affiliations

¹ Department of Medicine, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Villaviciosa de Odón, 28670, Madrid, Spain. juan.riestra@hotmail.com.
² Department of Otolaryngology-Head and Neck Surgery, Hospital Universitario de Getafe, Carretera de Toledo, Km 12.500, Getafe, 28905, Madrid, Spain. juan.riestra@hotmail.com.
³ Department of Medicine, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Villaviciosa de Odón, 28670, Madrid, Spain.
⁴ Department of Otolaryngology-Head and Neck Surgery, Hospital Universitario de Getafe, Carretera de Toledo, Km 12.500, Getafe, 28905, Madrid, Spain.

PMID: 38436756
DOI: 10.1007/s00405-024-08581-5

Abstract

Purpose: ChatGPT (Chat-Generative Pre-trained Transformer) has proven to be a powerful information tool on various topics, including healthcare. This system is based on information obtained on the Internet, but this information is not always reliable. Currently, few studies analyze the validity of these responses in rhinology. Our work aims to assess the quality and reliability of the information provided by AI regarding the main rhinological pathologies.

Methods: We asked to the default ChatGPT version (GPT-3.5) 65 questions about the most prevalent pathologies in rhinology. The focus was learning about the causes, risk factors, treatments, prognosis, and outcomes. We use the Discern questionnaire and a hexagonal radar schema to evaluate the quality of the information. We use Fleiss's kappa statistical analysis to determine the consistency of agreement between different observers.

Results: The overall evaluation of the Discern questionnaire resulted in a score of 4.05 (± 0.6). The results in the Reliability section are worse, with an average score of 3.18. (± 1.77). This score is affected by the responses to questions about the source of the information provided. The average score for the Quality section was 3.59 (± 1.18). Fleiss's Kappa shows substantial agreement, with a K of 0.69 (p < 0.001).

Conclusion: The ChatGPT answers are accurate and reliable. It generates a simple and understandable description of the pathology for the patient's benefit. Our team considers that ChatGPT could be a useful tool to provide information under prior supervision by a health professional.

Keywords: Artificial intelligence; ChatGPT; Chatbot; Healthcare; Natural language processing; Rhinology.