Light3DHS: A lightweight 3D hippocampus segmentation method using multiscale convolution attention and vision transformer

Zhiyong Xiao; Yuhong Zhang; Zhaohong Deng; Fei Liu

doi:10.1016/j.neuroimage.2024.120608

Light3DHS: A lightweight 3D hippocampus segmentation method using multiscale convolution attention and vision transformer

Neuroimage. 2024 Apr 15:292:120608. doi: 10.1016/j.neuroimage.2024.120608. Epub 2024 Apr 16.

Authors

Zhiyong Xiao¹, Yuhong Zhang², Zhaohong Deng², Fei Liu³

Affiliations

¹ School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214122, China; Institut Fresnel, Centre National de la Recherche Scientifique, Marseille, 13397, France.
² School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214122, China.
³ Wuxi Hospital of Traditional Chinese Medicine, Wuxi, 214071, China. Electronic address: liufeicindy@163.com.

PMID: 38626817
DOI: 10.1016/j.neuroimage.2024.120608

Abstract

The morphological analysis and volume measurement of the hippocampus are crucial to the study of many brain diseases. Therefore, an accurate hippocampal segmentation method is beneficial for the development of clinical research in brain diseases. U-Net and its variants have become prevalent in hippocampus segmentation of Magnetic Resonance Imaging (MRI) due to their effectiveness, and the architecture based on Transformer has also received some attention. However, some existing methods focus too much on the shape and volume of the hippocampus rather than its spatial information, and the extracted information is independent of each other, ignoring the correlation between local and global features. In addition, many methods cannot be effectively applied to practical medical image segmentation due to many parameters and high computational complexity. To this end, we combined the advantages of CNNs and ViTs (Vision Transformer) and proposed a simple and lightweight model: Light3DHS for the segmentation of the 3D hippocampus. In order to obtain richer local contextual features, the encoder first utilizes a multi-scale convolutional attention module (MCA) to learn the spatial information of the hippocampus. Considering the importance of local features and global semantics for 3D segmentation, we used a lightweight ViT to learn high-level features of scale invariance and further fuse local-to-global representation. To evaluate the effectiveness of encoder feature representation, we designed three decoders of different complexity to generate segmentation maps. Experiments on three common hippocampal datasets demonstrate that the network achieves more accurate hippocampus segmentation with fewer parameters. Light3DHS performs better than other state-of-the-art algorithms.

Keywords: 3D medical image segmentation; CNN; Lightweight; Multi-scale features fusion; Vision transformer.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Deep Learning
Hippocampus* / diagnostic imaging
Humans
Imaging, Three-Dimensional* / methods
Magnetic Resonance Imaging* / methods
Neural Networks, Computer