CoVi-Net: A hybrid convolutional and vision transformer neural network for retinal vessel segmentation

Minshan Jiang; Yongfei Zhu; Xuedian Zhang

doi:10.1016/j.compbiomed.2024.108047

CoVi-Net: A hybrid convolutional and vision transformer neural network for retinal vessel segmentation

Comput Biol Med. 2024 Mar:170:108047. doi: 10.1016/j.compbiomed.2024.108047. Epub 2024 Jan 29.

Authors

Minshan Jiang¹, Yongfei Zhu², Xuedian Zhang²

Affiliations

¹ Shanghai Key Laboratory of Contemporary Optics System, College of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China. Electronic address: jiangmsc@gmail.com.
² Shanghai Key Laboratory of Contemporary Optics System, College of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China.

PMID: 38295476
DOI: 10.1016/j.compbiomed.2024.108047

Abstract

Retinal vessel segmentation plays a crucial role in the diagnosis and treatment of ocular pathologies. Current methods have limitations in feature fusion and face challenges in simultaneously capturing global and local features from fundus images. To address these issues, this study introduces a hybrid network named CoVi-Net, which combines convolutional neural networks and vision transformer. In our proposed model, we have integrated a novel module for local and global feature aggregation (LGFA). This module facilitates remote information interaction while retaining the capability to effectively gather local information. In addition, we introduce a bidirectional weighted feature fusion module (BWF). Recognizing the variations in semantic information across layers, we allocate adjustable weights to different feature layers for adaptive feature fusion. BWF employs a bidirectional fusion strategy to mitigate the decay of effective information. We also incorporate horizontal and vertical connections to enhance feature fusion and utilization across various scales, thereby improving the segmentation of multiscale vessel images. Furthermore, we introduce an adaptive lateral feature fusion (ALFF) module that refines the final vessel segmentation map by enriching it with more semantic information from the network. In the evaluation of our model, we employed three well-established retinal image databases (DRIVE, CHASEDB1, and STARE). Our experimental results demonstrate that CoVi-Net outperforms other state-of-the-art techniques, achieving a global accuracy of 0.9698, 0.9756, and 0.9761 and an area under the curve of 0.9880, 0.9903, and 0.9915 on DRIVE, CHASEDB1, and STARE, respectively. We conducted ablation studies to assess the individual effectiveness of the three modules. In addition, we examined the adaptability of our CoVi-Net model for segmenting lesion images. Our experiments indicate that our proposed model holds promise in aiding the diagnosis of retinal vascular disorders.

Keywords: Bidirectional weighted feature fusion; Deep learning; Retinal vessel segmentation; Transformer.

MeSH terms

Databases, Factual
Fundus Oculi
Image Processing, Computer-Assisted
Neural Networks, Computer*
Retinal Vessels* / diagnostic imaging
Semantics