Detecting Eczema Areas in Digital Images: An Impossible Task?

Guillem Hurault; Kevin Pan; Ricardo Mokhtari; Bayanne Olabi; Eleanor Earp; Lloyd Steele; Hywel C Williams; Reiko J Tanaka

doi:10.1016/j.xjidi.2022.100133

Detecting Eczema Areas in Digital Images: An Impossible Task?

JID Innov. 2022 May 23;2(5):100133. doi: 10.1016/j.xjidi.2022.100133. eCollection 2022 Sep.

Authors

Guillem Hurault¹, Kevin Pan¹, Ricardo Mokhtari¹, Bayanne Olabi², Eleanor Earp³, Lloyd Steele⁴, Hywel C Williams^{2

5}, Reiko J Tanaka¹

Affiliations

¹ Department of Bioengineering, Imperial College London, London, United Kingdom.
² Biosciences Institute, Faculty of Medical Sciences, Newcastle University, United Kingdom.
³ Department of Dermatology, Lauriston Building, Edinburgh, United Kingdom.
⁴ Department of Dermatology, The Royal London Hospital, Barts Health NHS Trust, London, United Kingdom.
⁵ Centre of Evidence-Based Dermatology, University of Nottingham, Nottingham, United Kingdom.

Abstract

Assessing the severity of atopic dermatitis (AD, or eczema) traditionally relies on a face-to-face assessment by healthcare professionals and may suffer from inter- and intra-rater variability. With the expanding role of telemedicine, several machine learning algorithms have been proposed to automatically assess AD severity from digital images. Those algorithms usually detect and then delineate (segment) AD lesions before assessing lesional severity and are trained using the data of AD areas detected by healthcare professionals. To evaluate the reliability of such data, we estimated the inter-rater reliability of AD segmentation in digital images. Four dermatologists independently segmented AD lesions in 80 digital images collected in a published clinical trial. We estimated the inter-rater reliability of the AD segmentation using the intraclass correlation coefficient at the pixel and the area levels for different resolutions of the images. The average intraclass correlation coefficient was 0.45 ( $standard error = 0.04$ ) corresponding to a poor agreement between raters, whereas the degree of agreement for AD segmentation varied from image to image. The AD segmentation in digital images is highly rater dependent even among dermatologists. Such limitations need to be taken into consideration when AD segmentation data are used to train machine learning algorithms that assess eczema severity.

Keywords: AD, atopic dermatitis; ICC, intraclass correlation coefficient; IRR, inter-rater reliability; KA, Krippendorff’s alpha; ML, machine learning.

Grants and funding

MC_PC_19040/MRC_/Medical Research Council/United Kingdom