A systematic review and meta-analysis of artificial intelligence versus clinicians for skin cancer diagnosis

Maria Paz Salinas; Javiera Sepúlveda; Leonel Hidalgo; Dominga Peirano; Macarena Morel; Pablo Uribe; Veronica Rotemberg; Juan Briones; Domingo Mery; Cristian Navarrete-Dechent

doi:10.1038/s41746-024-01103-x

A systematic review and meta-analysis of artificial intelligence versus clinicians for skin cancer diagnosis

NPJ Digit Med. 2024 May 14;7(1):125. doi: 10.1038/s41746-024-01103-x.

Authors

Affiliations

¹ Department of Dermatology, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile.
² Universidad Catolica-Evidence Center, Cochrane Chile Associated Center, Pontificia Universidad Católica de Chile, Santiago, Chile.
³ Melanoma and Skin Cancer Unit, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile.
⁴ Dermatology Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
⁵ Department of Oncology, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile.
⁶ Department of Computer Science, Pontificia Universidad Católica de Chile, Santiago, Chile.
⁷ Department of Dermatology, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile. ctnavarr@gmail.com.
⁸ Melanoma and Skin Cancer Unit, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile. ctnavarr@gmail.com.

^# Contributed equally.

Abstract

Scientific research of artificial intelligence (AI) in dermatology has increased exponentially. The objective of this study was to perform a systematic review and meta-analysis to evaluate the performance of AI algorithms for skin cancer classification in comparison to clinicians with different levels of expertise. Based on PRISMA guidelines, 3 electronic databases (PubMed, Embase, and Cochrane Library) were screened for relevant articles up to August 2022. The quality of the studies was assessed using QUADAS-2. A meta-analysis of sensitivity and specificity was performed for the accuracy of AI and clinicians. Fifty-three studies were included in the systematic review, and 19 met the inclusion criteria for the meta-analysis. Considering all studies and all subgroups of clinicians, we found a sensitivity (Sn) and specificity (Sp) of 87.0% and 77.1% for AI algorithms, respectively, and a Sn of 79.78% and Sp of 73.6% for all clinicians (overall); differences were statistically significant for both Sn and Sp. The difference between AI performance (Sn 92.5%, Sp 66.5%) vs. generalists (Sn 64.6%, Sp 72.8%), was greater, when compared with expert clinicians. Performance between AI algorithms (Sn 86.3%, Sp 78.4%) vs expert dermatologists (Sn 84.2%, Sp 74.4%) was clinically comparable. Limitations of AI algorithms in clinical practice should be considered, and future studies should focus on real-world settings, and towards AI-assistance.

Publication types

Review