Comparison of computer systems and ranking criteria for automatic melanoma detection in dermoscopic images

Kajsa Møllersen; Maciel Zortea; Thomas R Schopf; Herbert Kirchesch; Fred Godtliebsen

doi:10.1371/journal.pone.0190112

Comparison of computer systems and ranking criteria for automatic melanoma detection in dermoscopic images

PLoS One. 2017 Dec 21;12(12):e0190112. doi: 10.1371/journal.pone.0190112. eCollection 2017.

Authors

Kajsa Møllersen¹, Maciel Zortea², Thomas R Schopf³, Herbert Kirchesch⁴, Fred Godtliebsen²

Affiliations

¹ Department of Community Medicine, UiT The Arctic University of Norway, Tromsø, Norway.
² Department of Mathematics and Statistics, UiT The Arctic University of Norway, Tromsø, Norway.
³ Norwegian Centre for E-health Research, University Hospital of North Norway, Tromsø, Norway.
⁴ Private office, Venloer Straße 107, 50259 Pulheim, Germany.

Abstract

Melanoma is the deadliest form of skin cancer, and early detection is crucial for patient survival. Computer systems can assist in melanoma detection, but are not widespread in clinical practice. In 2016, an open challenge in classification of dermoscopic images of skin lesions was announced. A training set of 900 images with corresponding class labels and semi-automatic/manual segmentation masks was released for the challenge. An independent test set of 379 images, of which 75 were of melanomas, was used to rank the participants. This article demonstrates the impact of ranking criteria, segmentation method and classifier, and highlights the clinical perspective. We compare five different measures for diagnostic accuracy by analysing the resulting ranking of the computer systems in the challenge. Choice of performance measure had great impact on the ranking. Systems that were ranked among the top three for one measure, dropped to the bottom half when changing performance measure. Nevus Doctor, a computer system previously developed by the authors, was used to participate in the challenge, and investigate the impact of segmentation and classifier. The diagnostic accuracy when using an automatic versus the semi-automatic/manual segmentation is investigated. The unexpected small impact of segmentation method suggests that improvements of the automatic segmentation method w.r.t. resemblance to semi-automatic/manual segmentation will not improve diagnostic accuracy substantially. A small set of similar classification algorithms are used to investigate the impact of classifier on the diagnostic accuracy. The variability in diagnostic accuracy for different classifier algorithms was larger than the variability for segmentation methods, and suggests a focus for future investigations. From a clinical perspective, the misclassification of a melanoma as benign has far greater cost than the misclassification of a benign lesion. For computer systems to have clinical impact, their performance should be ranked by a high-sensitivity measure.

Publication types

Comparative Study

MeSH terms

Algorithms
Computer Systems*
Dermoscopy / methods*
Humans
Melanoma / diagnosis*

Grants and funding

The authors received no specific funding for this work.