CoT-XNet: contextual transformer with Xception network for diabetic retinopathy grading

Shuiqing Zhao; Yanan Wu; Mengmeng Tong; Yudong Yao; Wei Qian; Shouliang Qi

doi:10.1088/1361-6560/ac9fa0

CoT-XNet: contextual transformer with Xception network for diabetic retinopathy grading

Phys Med Biol. 2022 Dec 6;67(24). doi: 10.1088/1361-6560/ac9fa0.

Authors

Shuiqing Zhao^{1

2}, Yanan Wu¹, Mengmeng Tong³, Yudong Yao⁴, Wei Qian¹, Shouliang Qi^{1

2}

Affiliations

¹ College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, People's Republic of China.
² Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, People's Republic of China.
³ Ningbo Blue Illumination Tech Co., Ltd, Ningbo, People's Republic of China.
⁴ Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, NJ, United States of America.

PMID: 36322995
DOI: 10.1088/1361-6560/ac9fa0

Abstract

Objective.Diabetic retinopathy (DR) grading is primarily performed by assessing fundus images. Many types of lesions, such as microaneurysms, hemorrhages, and soft exudates, are available simultaneously in a single image. However, their sizes may be small, making it difficult to differentiate adjacent DR grades even using deep convolutional neural networks (CNNs). Recently, a vision transformer has shown comparable or even superior performance to CNNs, and it also learns different visual representations from CNNs. Inspired by this finding, we propose a two-path contextual transformer with Xception network (CoT-XNet) to improve the accuracy of DR grading.Approach.The representations learned by CoT through one path and those by the Xception network through another path are concatenated before the fully connected layer. Meanwhile, the dedicated pre-processing, data resampling, and test time augmentation strategies are implemented. The performance of CoT-XNet is evaluated in the publicly available datasets of DDR, APTOS2019, and EyePACS, which include over 50 000 images. Ablation experiments and comprehensive comparisons with various state-of-the-art (SOTA) models have also been performed.Main results.Our proposed CoT-XNet shows better performance than available SOTA models, and the accuracy and Kappa are 83.10% and 0.8496, 84.18% and 0.9000 and 84.10% and 0.7684 respectively, in the three datasets (listed above). Class activation maps of CoT and Xception networks are different and complementary in most images.Significance.By concatenating the different visual representations learned by CoT and Xception networks, CoT-XNet can accurately grade DR from fundus images and present good generalizability. CoT-XNet will promote the application of artificial intelligence-based systems in the DR screening of large-scale populations.

Keywords: convolutional neural network; deep learning; diabetic retinopathy grading; vision transformer.

MeSH terms

Artificial Intelligence
Diabetes Mellitus*
Diabetic Retinopathy* / diagnostic imaging
Humans