Knowledge Distillation in Histology Landscape by Multi-Layer Features Supervision

Sajid Javed; Arif Mahmood; Talha Qaiser; Naoufel Werghi

doi:10.1109/JBHI.2023.3237749

Knowledge Distillation in Histology Landscape by Multi-Layer Features Supervision

IEEE J Biomed Health Inform. 2023 Jan 17:PP. doi: 10.1109/JBHI.2023.3237749. Online ahead of print.

Authors

Sajid Javed, Arif Mahmood, Talha Qaiser, Naoufel Werghi

PMID: 37021915
DOI: 10.1109/JBHI.2023.3237749

Abstract

Automatic tissue classification is a fundamental task in computational pathology for profiling tumor micro-environments. Deep learning has advanced tissue classification performance at the cost of significant computational power. Shallow networks have also been end-to-end trained using direct supervision however their performance degrades because of the lack of capturing robust tissue heterogeneity. Knowledge distillation has recently been employed to improve the performance of the shallow networks used as student networks by using additional supervision from deep neural networks used as teacher networks. In the current work, we propose a novel knowledge distillation algorithm to improve the performance of shallow networks for tissue phenotyping in histology images. For this purpose, we propose multi-layer feature distillation such that a single layer in the student network gets supervision from multiple teacher layers. In the proposed algorithm, the size of the feature map of two layers is matched by using a learnable multi-layer perceptron. The distance between the feature maps of the two layers is then minimized during the training of the student network. The overall objective function is computed by summation of the loss over multiple layers combination weighted with a learnable attention-based parameter. The proposed algorithm is named as Knowledge Distillation for Tissue Phenotyping (KDTP). Experiments are performed on five different publicly available histology image classification datasets using several teacher-student network combinations within the KDTP algorithm. Our results demonstrate a significant performance increase in the student networks by using the proposed KDTP algorithm compared to direct supervision-based training methods.