Determining the clinical applicability of machine learning models through assessment of reporting across skin phototypes and rarer skin cancer types: A systematic review

Lloyd Steele; Xiang Li Tan; Bayanne Olabi; Jing Mia Gao; Reiko J Tanaka; Hywel C Williams

doi:10.1111/jdv.18814

Determining the clinical applicability of machine learning models through assessment of reporting across skin phototypes and rarer skin cancer types: A systematic review

J Eur Acad Dermatol Venereol. 2023 Apr;37(4):657-665. doi: 10.1111/jdv.18814. Epub 2023 Jan 2.

Authors

Lloyd Steele^{1

2}, Xiang Li Tan³, Bayanne Olabi⁴, Jing Mia Gao¹, Reiko J Tanaka⁵, Hywel C Williams⁶

Affiliations

¹ Department of Dermatology, The Royal London Hospital, London, UK.
² Centre for Cell Biology and Cutaneous Research, Blizard Institute, Queen Mary University of London, London, UK.
³ St George's University Hospitals NHS Foundation Trust, London, UK.
⁴ Biosciences Institute, Newcastle University, Newcastle, UK.
⁵ Department of Bioengineering, Imperial College London, London, UK.
⁶ Centre of Evidence-Based Dermatology, School of Medicine, University of Nottingham, Nottingham, UK.

PMID: 36514990
DOI: 10.1111/jdv.18814

Abstract

Machine learning (ML) models for skin cancer recognition may have variable performance across different skin phototypes and skin cancer types. Overall performance metrics alone are insufficient to detect poor subgroup performance. We aimed (1) to assess whether studies of ML models reported results separately for different skin phototypes and rarer skin cancers, and (2) to graphically represent the skin cancer training datasets used by current ML models. In this systematic review, we searched PubMed, Embase and CENTRAL. We included all studies in medical journals assessing an ML technique for skin cancer diagnosis that used clinical or dermoscopic images from 1 January 2012 to 22 September 2021. No language restrictions were applied. We considered rarer skin cancers to be skin cancers other than pigmented melanoma, basal cell carcinoma and squamous cell carcinoma. We identified 114 studies for inclusion. Rarer skin cancers were included by 8/114 studies (7.0%), and results for a rarer skin cancer were reported separately in 1/114 studies (0.9%). Performance was reported across all skin phototypes in 1/114 studies (0.9%), but performance was uncertain in skin phototypes I and VI from minimal representation of the skin phototypes in the test dataset (9/3756 and 1/3756, respectively). For training datasets, although public datasets were most frequently used, with the most widely used being the International Skin Imaging Collaboration (ISIC) archive (65/114 studies, 57.0%), the largest datasets were private. Our review identified that most ML models did not report performance separately for rarer skin cancers and different skin phototypes. A degree of variability in ML model performance across subgroups is expected, but the current lack of transparency is not justifiable and risks models being used inappropriately in populations in whom accuracy is low.

Publication types

Systematic Review
Review

MeSH terms

Carcinoma, Basal Cell* / pathology
Carcinoma, Squamous Cell* / pathology
Humans
Melanoma* / diagnosis
Melanoma* / pathology
Skin / pathology
Skin Neoplasms* / pathology