An interpretable transformer network for the retinal disease classification using optical coherence tomography

Jingzhen He; Junxia Wang; Zeyu Han; Jun Ma; Chongjing Wang; Meng Qi

doi:10.1038/s41598-023-30853-z

An interpretable transformer network for the retinal disease classification using optical coherence tomography

Sci Rep. 2023 Mar 3;13(1):3637. doi: 10.1038/s41598-023-30853-z.

Authors

Jingzhen He¹, Junxia Wang², Zeyu Han³, Jun Ma⁴, Chongjing Wang⁵, Meng Qi⁶

Affiliations

¹ Department of Radiology, Qilu Hospital of Shandong University, Jinan, 250012, China. hjzhhjzh@163.com.
² School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China.
³ School of Mathematics and Statistics, Shandong University, Weihai, 264209, China.
⁴ School of Cyber Science and Engineering, Southeast University, Nanjing, 211189, China.
⁵ China Academy of Information and Communications Technology, Beijing, 100191, China.
⁶ School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China. qimeng@sdnu.edu.cn.

Abstract

Retinal illnesses such as age-related macular degeneration and diabetic macular edema will lead to irreversible blindness. With optical coherence tomography (OCT), doctors are able to see cross-sections of the retinal layers and provide patients with a diagnosis. Manual reading of OCT images is time-consuming, labor-intensive and even error-prone. Computer-aided diagnosis algorithms improve efficiency by automatically analyzing and diagnosing retinal OCT images. However, the accuracy and interpretability of these algorithms can be further improved through effective feature extraction, loss optimization and visualization analysis. In this paper, we propose an interpretable Swin-Poly Transformer network for performing automatically retinal OCT image classification. By shifting the window partition, the Swin-Poly Transformer constructs connections between neighboring non-overlapping windows in the previous layer and thus has the flexibility to model multi-scale features. Besides, the Swin-Poly Transformer modifies the importance of polynomial bases to refine cross entropy for better retinal OCT image classification. In addition, the proposed method also provides confidence score maps, assisting medical practitioners to understand the models' decision-making process. Experiments in OCT2017 and OCT-C8 reveal that the proposed method outperforms both the convolutional neural network approach and ViT, with an accuracy of 99.80% and an AUC of 99.99%.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Diabetic Retinopathy*
Humans
Macular Edema*
Retina
Retinal Diseases*
Tomography, Optical Coherence