Navigating the Landscape of Personalized Medicine: The Relevance of ChatGPT, BingChat, and Bard AI in Nephrology Literature Searches

Noppawit Aiumtrakul; Charat Thongprayoon; Supawadee Suppadungsuk; Pajaree Krisanapan; Jing Miao; Fawad Qureshi; Wisit Cheungpasitporn

doi:10.3390/jpm13101457

Navigating the Landscape of Personalized Medicine: The Relevance of ChatGPT, BingChat, and Bard AI in Nephrology Literature Searches

J Pers Med. 2023 Sep 30;13(10):1457. doi: 10.3390/jpm13101457.

Authors

Noppawit Aiumtrakul¹, Charat Thongprayoon², Supawadee Suppadungsuk^{2

3}, Pajaree Krisanapan^{2

4}, Jing Miao², Fawad Qureshi², Wisit Cheungpasitporn²

Affiliations

¹ Department of Medicine, John A. Burns School of Medicine, University of Hawaii, Honolulu, HI 96813, USA.
² Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA.
³ Chakri Naruebodindra Medical Institute, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Samut Prakan 10540, Thailand.
⁴ Department of Internal Medicine, Faculty of Medicine, Thammasat University, Pathum Thani 12120, Thailand.

Abstract

Background and objectives: Literature reviews are foundational to understanding medical evidence. With AI tools like ChatGPT, Bing Chat and Bard AI emerging as potential aids in this domain, this study aimed to individually assess their citation accuracy within Nephrology, comparing their performance in providing precise.

Materials and methods: We generated the prompt to solicit 20 references in Vancouver style in each 12 Nephrology topics, using ChatGPT, Bing Chat and Bard. We verified the existence and accuracy of the provided references using PubMed, Google Scholar, and Web of Science. We categorized the validity of the references from the AI chatbot into (1) incomplete, (2) fabricated, (3) inaccurate, and (4) accurate.

Results: A total of 199 (83%), 158 (66%) and 112 (47%) unique references were provided from ChatGPT, Bing Chat and Bard, respectively. ChatGPT provided 76 (38%) accurate, 82 (41%) inaccurate, 32 (16%) fabricated and 9 (5%) incomplete references. Bing Chat provided 47 (30%) accurate, 77 (49%) inaccurate, 21 (13%) fabricated and 13 (8%) incomplete references. In contrast, Bard provided 3 (3%) accurate, 26 (23%) inaccurate, 71 (63%) fabricated and 12 (11%) incomplete references. The most common error type across platforms was incorrect DOIs.

Conclusions: In the field of medicine, the necessity for faultless adherence to research integrity is highlighted, asserting that even small errors cannot be tolerated. The outcomes of this investigation draw attention to inconsistent citation accuracy across the different AI tools evaluated. Despite some promising results, the discrepancies identified call for a cautious and rigorous vetting of AI-sourced references in medicine. Such chatbots, before becoming standard tools, need substantial refinements to assure unwavering precision in their outputs.

Keywords: Bard AI; Bing Chat; ChatGPT; accuracy; literature review; nephrology references; personalized medicine; precision medicine.

Grants and funding

This research received no external funding.