Accuracy and Completeness of ChatGPT-Generated Information on Interceptive Orthodontics: A Multicenter Collaborative Study

Arjeta Hatia; Tiziana Doldo; Stefano Parrini; Elettra Chisci; Linda Cipriani; Livia Montagna; Giuseppina Lagana; Guia Guenza; Edoardo Agosta; Franceska Vinjolli; Meladiona Hoxha; Claudio D'Amelio; Nicolò Favaretto; Glauco Chisci

doi:10.3390/jcm13030735

Accuracy and Completeness of ChatGPT-Generated Information on Interceptive Orthodontics: A Multicenter Collaborative Study

J Clin Med. 2024 Jan 27;13(3):735. doi: 10.3390/jcm13030735.

Authors

Affiliations

¹ Orthodontics Postgraduate School, Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy.
² Oral Surgery Postgraduate School, Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy.
³ Orthodontics Postgraduate School, University of Ferrara, 44121 Ferrara, Italy.
⁴ Orthodontics Postgraduate School, University of Cagliari, 09121 Cagliari, Italy.
⁵ Orthodontics Postgraduate School, "Sapienza" University of Rome, 00185 Rome, Italy.
⁶ Orthodontics Postgraduate School, University of Milano, 20019 Milan, Italy.
⁷ Orthodontics Postgraduate School, University of Torino, 10024 Turin, Italy.
⁸ Orthodontics Postgraduate School, University of Roma Tor Vergata, 00133 Rome, Italy.
⁹ Orthodontics Postgraduate School, "Cattolica" University of Rome, 00168 Rome, Italy.
¹⁰ Orthodontics Postgraduate School, University of Chieti, 66100 Chieti, Italy.
¹¹ Orthodontics Postgraduate School, University of Trieste, 34100 Trieste, Italy.

Abstract

Background: this study aims to investigate the accuracy and completeness of ChatGPT in answering questions and solving clinical scenarios of interceptive orthodontics. Materials and Methods: ten specialized orthodontists from ten Italian postgraduate orthodontics schools developed 21 clinical open-ended questions encompassing all of the subspecialities of interceptive orthodontics and 7 comprehensive clinical cases. Questions and scenarios were inputted into ChatGPT4, and the resulting answers were evaluated by the researchers using predefined accuracy (range 1-6) and completeness (range 1-3) Likert scales. Results: For the open-ended questions, the overall median score was 4.9/6 for the accuracy and 2.4/3 for completeness. In addition, the reviewers rated the accuracy of open-ended answers as entirely correct (score 6 on Likert scale) in 40.5% of cases and completeness as entirely correct (score 3 n Likert scale) in 50.5% of cases. As for the clinical cases, the overall median score was 4.9/6 for accuracy and 2.5/3 for completeness. Overall, the reviewers rated the accuracy of clinical case answers as entirely correct in 46% of cases and the completeness of clinical case answers as entirely correct in 54.3% of cases. Conclusions: The results showed a high level of accuracy and completeness in AI responses and a great ability to solve difficult clinical cases, but the answers were not 100% accurate and complete. ChatGPT is not yet sophisticated enough to replace the intellectual work of human beings.

Keywords: ChatGPT; artificial bot; artificial intelligence; interceptive orthodontics; methodology; orthodontics.

Grants and funding

This research received no external funding.