A loss-based patch label denoising method for improving whole-slide image analysis using a convolutional neural network

Murtaza Ashraf; Willmer Rafell Quiñones Robles; Mujin Kim; Young Sin Ko; Mun Yong Yi

doi:10.1038/s41598-022-05001-8

A loss-based patch label denoising method for improving whole-slide image analysis using a convolutional neural network

Sci Rep. 2022 Jan 26;12(1):1392. doi: 10.1038/s41598-022-05001-8.

Authors

Murtaza Ashraf¹, Willmer Rafell Quiñones Robles¹, Mujin Kim¹, Young Sin Ko², Mun Yong Yi³

Affiliations

¹ Department of Industrial and Systems Engineering, Graduate School of Knowledge Service Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea.
² Pathology Center, Seegene Medical Foundation, Seoul, South Korea.
³ Department of Industrial and Systems Engineering, Graduate School of Knowledge Service Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea. munyi@kaist.ac.kr.

Abstract

This paper proposes a deep learning-based patch label denoising method (LossDiff) for improving the classification of whole-slide images of cancer using a convolutional neural network (CNN). Automated whole-slide image classification is often challenging, requiring a large amount of labeled data. Pathologists annotate the region of interest by marking malignant areas, which pose a high risk of introducing patch-based label noise by involving benign regions that are typically small in size within the malignant annotations, resulting in low classification accuracy with many Type-II errors. To overcome this critical problem, this paper presents a simple yet effective method for noisy patch classification. The proposed method, validated using stomach cancer images, provides a significant improvement compared to other existing methods in patch-based cancer classification, with accuracies of 98.81%, 97.30% and 89.47% for binary, ternary, and quaternary classes, respectively. Moreover, we conduct several experiments at different noise levels using a publicly available dataset to further demonstrate the robustness of the proposed method. Given the high cost of producing explicit annotations for whole-slide images and the unavoidable error-prone nature of the human annotation of medical images, the proposed method has practical implications for whole-slide image annotation and automated cancer diagnosis.

Publication types

Research Support, Non-U.S. Gov't