OCT2Former: A retinal OCT-angiography vessel segmentation transformer

Xiao Tan; Xinjian Chen; Qingquan Meng; Fei Shi; Dehui Xiang; Zhongyue Chen; Lingjiao Pan; Weifang Zhu

doi:10.1016/j.cmpb.2023.107454

OCT²Former: A retinal OCT-angiography vessel segmentation transformer

Comput Methods Programs Biomed. 2023 May:233:107454. doi: 10.1016/j.cmpb.2023.107454. Epub 2023 Mar 5.

Authors

Xiao Tan¹, Xinjian Chen², Qingquan Meng¹, Fei Shi¹, Dehui Xiang¹, Zhongyue Chen¹, Lingjiao Pan³, Weifang Zhu⁴

Affiliations

¹ MIPAV Lab, the School of Electronic and Information Engineering, Soochow University, Jiangsu, China.
² MIPAV Lab, the School of Electronic and Information Engineering, Soochow University, Jiangsu, China; The State Key Laboratory of Radiation Medicine and Protection, Soochow University, Jiangsu, China.
³ School of Electrical and Information Engineering, Jiangsu University of Technology, Jiangsu, China.
⁴ MIPAV Lab, the School of Electronic and Information Engineering, Soochow University, Jiangsu, China. Electronic address: wfzhu@suda.edu.cn.

PMID: 36921468
DOI: 10.1016/j.cmpb.2023.107454

Abstract

Background and objective: Retinal vessel segmentation plays an important role in the automatic retinal disease screening and diagnosis. How to segment thin vessels and maintain the connectivity of vessels are the key challenges of the retinal vessel segmentation task. Optical coherence tomography angiography (OCTA) is a noninvasive imaging technique that can reveal high-resolution retinal vessels. Aiming at make full use of its characteristic of high resolution, a new end-to-end transformer based network named as OCT²Former (OCT-a Transformer) is proposed to segment retinal vessel accurately in OCTA images.

Methods: The proposed OCT²Former is based on encoder-decoder structure, which mainly includes dynamic transformer encoder and lightweight decoder. Dynamic transformer encoder consists of dynamic token aggregation transformer and auxiliary convolution branch, in which the multi-head dynamic token aggregation attention based dynamic token aggregation transformer is designed to capture the global retinal vessel context information from the first layer throughout the network and the auxiliary convolution branch is proposed to compensate for the lack of inductive bias of the transformer and assist in the efficient feature extraction. A convolution based lightweight decoder is proposed to decode features efficiently and reduce the complexity of the proposed OCT²Former.

Results: The proposed OCT²Former is validated on three publicly available datasets i.e. OCTA-SS, ROSE-1, OCTA-500 (subset OCTA-6M and OCTA-3M). The Jaccard indexes of the proposed OCT²Former on these datasets are 0.8344, 0.7855, 0.8099 and 0.8513, respectively, outperforming the best convolution based network 1.43, 1.32, 0.75 and 1.46%, respectively.

Conclusion: The experimental results have demonstrated that the proposed OCT²Former can achieve competitive performance on retinal OCTA vessel segmentation tasks.

Keywords: Deep learning; Dynamic token aggregation; Optical coherence tomography angiography; Retinal vessel segmentation; Transformer.

MeSH terms

Fluorescein Angiography / methods
Mass Screening*
Retinal Vessels* / diagnostic imaging
Tomography, Optical Coherence / methods