CoT-XNet: contextual transformer with Xception network for diabetic retinopathy grading

Phys Med Biol. 2022 Dec 6;67(24). doi: 10.1088/1361-6560/ac9fa0.

Abstract

Objective.Diabetic retinopathy (DR) grading is primarily performed by assessing fundus images. Many types of lesions, such as microaneurysms, hemorrhages, and soft exudates, are available simultaneously in a single image. However, their sizes may be small, making it difficult to differentiate adjacent DR grades even using deep convolutional neural networks (CNNs). Recently, a vision transformer has shown comparable or even superior performance to CNNs, and it also learns different visual representations from CNNs. Inspired by this finding, we propose a two-path contextual transformer with Xception network (CoT-XNet) to improve the accuracy of DR grading.Approach.The representations learned by CoT through one path and those by the Xception network through another path are concatenated before the fully connected layer. Meanwhile, the dedicated pre-processing, data resampling, and test time augmentation strategies are implemented. The performance of CoT-XNet is evaluated in the publicly available datasets of DDR, APTOS2019, and EyePACS, which include over 50 000 images. Ablation experiments and comprehensive comparisons with various state-of-the-art (SOTA) models have also been performed.Main results.Our proposed CoT-XNet shows better performance than available SOTA models, and the accuracy and Kappa are 83.10% and 0.8496, 84.18% and 0.9000 and 84.10% and 0.7684 respectively, in the three datasets (listed above). Class activation maps of CoT and Xception networks are different and complementary in most images.Significance.By concatenating the different visual representations learned by CoT and Xception networks, CoT-XNet can accurately grade DR from fundus images and present good generalizability. CoT-XNet will promote the application of artificial intelligence-based systems in the DR screening of large-scale populations.

Keywords: convolutional neural network; deep learning; diabetic retinopathy grading; vision transformer.

MeSH terms

  • Artificial Intelligence
  • Diabetes Mellitus*
  • Diabetic Retinopathy* / diagnostic imaging
  • Humans