Diagnosis in Bytes: Comparing the Diagnostic Accuracy of Google and ChatGPT 3.5 as an Educational Support Tool

Int J Environ Res Public Health. 2024 May 1;21(5):580. doi: 10.3390/ijerph21050580.

Abstract

Background: Adopting advanced digital technologies as diagnostic support tools in healthcare is an unquestionable trend accelerated by the COVID-19 pandemic. However, their accuracy in suggesting diagnoses remains controversial and needs to be explored. We aimed to evaluate and compare the diagnostic accuracy of two free accessible internet search tools: Google and ChatGPT 3.5.

Methods: To assess the effectiveness of both medical platforms, we conducted evaluations using a sample of 60 clinical cases related to urological pathologies. We organized the urological cases into two distinct categories for our analysis: (i) prevalent conditions, which were compiled using the most common symptoms, as outlined by EAU and UpToDate guidelines, and (ii) unusual disorders, identified through case reports published in the 'Urology Case Reports' journal from 2022 to 2023. The outcomes were meticulously classified into three categories to determine the accuracy of each platform: "correct diagnosis", "likely differential diagnosis", and "incorrect diagnosis". A group of experts evaluated the responses blindly and randomly.

Results: For commonly encountered urological conditions, Google's accuracy was 53.3%, with an additional 23.3% of its results falling within a plausible range of differential diagnoses, and the remaining outcomes were incorrect. ChatGPT 3.5 outperformed Google with an accuracy of 86.6%, provided a likely differential diagnosis in 13.3% of cases, and made no unsuitable diagnosis. In evaluating unusual disorders, Google failed to deliver any correct diagnoses but proposed a likely differential diagnosis in 20% of cases. ChatGPT 3.5 identified the proper diagnosis in 16.6% of rare cases and offered a reasonable differential diagnosis in half of the cases.

Conclusion: ChatGPT 3.5 demonstrated higher diagnostic accuracy than Google in both contexts. The platform showed satisfactory accuracy when diagnosing common cases, yet its performance in identifying rare conditions remains limited.

Keywords: artificial intelligence; diagnosis; medical education; medical informatics applications; urology.

Publication types

  • Comparative Study

MeSH terms

  • COVID-19 / diagnosis
  • Diagnosis, Differential
  • Humans
  • Internet
  • SARS-CoV-2
  • Search Engine*
  • Urologic Diseases / diagnosis

Grants and funding

This research received no external funding.