Personalized Medicine in Urolithiasis: AI Chatbot-Assisted Dietary Management of Oxalate for Kidney Stone Prevention

Noppawit Aiumtrakul; Charat Thongprayoon; Chinnawat Arayangkool; Kristine B Vo; Chalothorn Wannaphut; Supawadee Suppadungsuk; Pajaree Krisanapan; Oscar A Garcia Valencia; Fawad Qureshi; Jing Miao; Wisit Cheungpasitporn

doi:10.3390/jpm14010107

Personalized Medicine in Urolithiasis: AI Chatbot-Assisted Dietary Management of Oxalate for Kidney Stone Prevention

J Pers Med. 2024 Jan 18;14(1):107. doi: 10.3390/jpm14010107.

Affiliations

¹ Department of Medicine, John A. Burn School of Medicine, University of Hawaii, Honolulu, HI 96813, USA.
² Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA.
³ Chakri Naruebodindra Medical Institute, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Samut Prakan 10540, Thailand.
⁴ Division of Nephrology, Department of Internal Medicine, Faculty of Medicine, Thammasat University, Pathum Thani 12120, Thailand.

Abstract

Accurate information regarding oxalate levels in foods is essential for managing patients with hyperoxaluria, oxalate nephropathy, or those susceptible to calcium oxalate stones. This study aimed to assess the reliability of chatbots in categorizing foods based on their oxalate content. We assessed the accuracy of ChatGPT-3.5, ChatGPT-4, Bard AI, and Bing Chat to classify dietary oxalate content per serving into low (<5 mg), moderate (5-8 mg), and high (>8 mg) oxalate content categories. A total of 539 food items were processed through each chatbot. The accuracy was compared between chatbots and stratified by dietary oxalate content categories. Bard AI had the highest accuracy of 84%, followed by Bing (60%), GPT-4 (52%), and GPT-3.5 (49%) (p < 0.001). There was a significant pairwise difference between chatbots, except between GPT-4 and GPT-3.5 (p = 0.30). The accuracy of all the chatbots decreased with a higher degree of dietary oxalate content categories but Bard remained having the highest accuracy, regardless of dietary oxalate content categories. There was considerable variation in the accuracy of AI chatbots for classifying dietary oxalate content. Bard AI consistently showed the highest accuracy, followed by Bing Chat, GPT-4, and GPT-3.5. These results underline the potential of AI in dietary management for at-risk patient groups and the need for enhancements in chatbot algorithms for clinical accuracy.

Keywords: accuracy; chatbots; hyperoxaluria; kidney stone; nephrolithiasis; oxalate food content; oxalate nephropathy; personalized medicine; urolithiasis.

Grants and funding

This research received no external funding.