Man against machine reloaded: performance of a market-approved convolutional neural network in classifying a broad spectrum of skin lesions in comparison with 96 dermatologists working under less artificial conditions

H A Haenssle; C Fink; F Toberer; J Winkler; W Stolz; T Deinlein; R Hofmann-Wellenhof; A Lallas; S Emmert; T Buhl; M Zutt; A Blum; M S Abassi; L Thomas; I Tromme; P Tschandl; A Enk; A Rosenberger; Reader Study Level I and Level II Groups

doi:10.1016/j.annonc.2019.10.013

Man against machine reloaded: performance of a market-approved convolutional neural network in classifying a broad spectrum of skin lesions in comparison with 96 dermatologists working under less artificial conditions

Ann Oncol. 2020 Jan;31(1):137-143. doi: 10.1016/j.annonc.2019.10.013.

Authors

H A Haenssle¹, C Fink², F Toberer², J Winkler², W Stolz³, T Deinlein⁴, R Hofmann-Wellenhof⁴, A Lallas⁵, S Emmert⁶, T Buhl⁷, M Zutt⁸, A Blum⁹, M S Abassi¹⁰, L Thomas¹¹, I Tromme¹², P Tschandl¹³, A Enk², A Rosenberger¹⁴; Reader Study Level I and Level II Groups

Collaborators

Reader Study Level I and Level II Groups:
Christina Alt, Marie Bachelerie, Sonali Bajaj, Alise Balcere, Sophie Baricault, Clément Barthaux, Yvonne Beckenbauer, Ines Bertlich, Andreas Blum, Marie-France Bouthenet, Sophie Brassat, Philipp Marcel Buck, Kristina Buder-Bakhaya, Maria-Letizia Cappelletti, Cécile Chabbert, Julie De Labarthe, Eveline DeCoster, Teresa Deinlein, Michèle Dobler, Daphnée Dumon, Steffen Emmert, Julie Gachon-Buffet, Mikhail Gusarov, Franziska Hartmann, Julia Hartmann, Anke Herrmann, Isabelle Hoorens, Eva Hulstaert, Raimonds Karls, Andreea Kolonte, Christian Kromer, Aimilios Lallas, Céline Le Blanc Vasseux, Annabelle Levy-Roy, Pawel Majenka, Marine Marc, Veronique Martin Bourret, Nadège Michelet-Brunacci, Christina Mitteldorf, Jean Paroissien, Camille Picard, Diana Plise, Valérie Reymann, Fabrice Ribeaudeau, Pauline Richez, Hélène Roche Plaine, Deborah Salik, Elke Sattler, Sarah Schäfer, Roland Schneiderbauer, Thierry Secchi, Karen Talour, Lukas Trennheuser, Alexander Wald, Priscila Wölbing, Pascale Zukervar

Affiliations

¹ Department of Dermatology, University of Heidelberg, Heidelberg, Germany. Electronic address: Holger.Haenssle@med.uni-heidelberg.de.
² Department of Dermatology, University of Heidelberg, Heidelberg, Germany.
³ Department of Dermatology, Allergology and Environmental Medicine II, Munich, Germany.
⁴ Department of Dermatology and Venerology, Medical University of Graz, Graz, Austria.
⁵ First Department of Dermatology, Aristotle University, Thessaloniki, Greece.
⁶ Department of Dermatology, University of Rostock, Rostock, Germany.
⁷ Department of Dermatology, University of Göttingen, Göttingen, Germany.
⁸ Department of Dermatology and Allergology, Klinikum Bremen-Mitte, Bremen, Germany.
⁹ Office Based Clinic of Dermatology, Konstanz, Germany.
¹⁰ Faculty of Computer Science and Mathematics, University of Passau, Passau, Germany.
¹¹ Department of Dermatology, Lyons Cancer Research Center, Lyon 1 University, Lyon, France.
¹² Department of Dermatology, Université Catholique de Louvain, St Luc University Hospital, Brussels, Belgium.
¹³ Department of Dermatology, Medical University of Vienna, Vienna, Austria.
¹⁴ Department of Genetic Epidemiology, University of Goettingen, Goettingen, Germany.

PMID: 31912788
DOI: 10.1016/j.annonc.2019.10.013

Abstract

Background: Convolutional neural networks (CNNs) efficiently differentiate skin lesions by image analysis. Studies comparing a market-approved CNN in a broad range of diagnoses to dermatologists working under less artificial conditions are lacking.

Materials and methods: One hundred cases of pigmented/non-pigmented skin cancers and benign lesions were used for a two-level reader study in 96 dermatologists (level I: dermoscopy only; level II: clinical close-up images, dermoscopy, and textual information). Additionally, dermoscopic images were classified by a CNN approved for the European market as a medical device (Moleanalyzer Pro, FotoFinder Systems, Bad Birnbach, Germany). Primary endpoints were the sensitivity and specificity of the CNN's dichotomous classification in comparison with the dermatologists' management decisions. Secondary endpoints included the dermatologists' diagnostic decisions, their performance according to their level of experience, and the CNN's area under the curve (AUC) of receiver operating characteristics (ROC).

Results: The CNN revealed a sensitivity, specificity, and ROC AUC with corresponding 95% confidence intervals (CI) of 95.0% (95% CI 83.5% to 98.6%), 76.7% (95% CI 64.6% to 85.6%), and 0.918 (95% CI 0.866-0.970), respectively. In level I, the dermatologists' management decisions showed a mean sensitivity and specificity of 89.0% (95% CI 87.4% to 90.6%) and 80.7% (95% CI 78.8% to 82.6%). With level II information, the sensitivity significantly improved to 94.1% (95% CI 93.1% to 95.1%; P < 0.001), while the specificity remained unchanged at 80.4% (95% CI 78.4% to 82.4%; P = 0.97). When fixing the CNN's specificity at the mean specificity of the dermatologists' management decision in level II (80.4%), the CNN's sensitivity was almost equal to that of human raters, at 95% (95% CI 83.5% to 98.6%) versus 94.1% (95% CI 93.1% to 95.1%); P = 0.1. In contrast, dermatologists were outperformed by the CNN in their level I management decisions and level I and II diagnostic decisions. More experienced dermatologists frequently surpassed the CNN's performance.

Conclusions: Under less artificial conditions and in a broader spectrum of diagnoses, the CNN and most dermatologists performed on the same level. Dermatologists are trained to integrate information from a range of sources rendering comparative studies that are solely based on one single case image inadequate.

Keywords: Moleanalyzer Pro; deep learning; dermoscopy; melanoma; neural network; skin cancer.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Dermatologists
Dermoscopy
Germany
Humans
Male
Melanoma* / diagnostic imaging
Neural Networks, Computer
Skin Neoplasms*