RTC_TongueNet: An improved tongue image segmentation model based on DeepLabV3

Yan Tang; Daiqing Tan; Huixia Li; Muhua Zhu; Xiaohui Li; Xuan Wang; JiaQi Wang; Zaijian Wang; Chenxi Gao; Ji Wang; Aiqing Han

doi:10.1177/20552076241242773

RTC_TongueNet: An improved tongue image segmentation model based on DeepLabV3

Digit Health. 2024 Mar 28:10:20552076241242773. doi: 10.1177/20552076241242773. eCollection 2024 Jan-Dec.

Authors

Yan Tang¹, Daiqing Tan¹, Huixia Li², Muhua Zhu¹, Xiaohui Li¹, Xuan Wang¹, JiaQi Wang¹, Zaijian Wang², Chenxi Gao¹, Ji Wang³, Aiqing Han¹

Affiliations

¹ Beijing University of Chinese Medicine, Beijing, China.
² The Third Affiliated Hospital of Beijing University of Chinese Medicine, Beijing, China.
³ National Institute of TCM Constitution and Prevention Medicine, Beijing University of Chinese Medicine, Beijing, China.

Abstract

Objective: Tongue segmentation as a basis for automated tongue recognition studies in Chinese medicine, which has defects such as network degradation and inability to obtain global features, which seriously affects the segmentation effect. This article proposes an improved model RTC_TongueNet based on DeepLabV3, which combines the improved residual structure and transformer and integrates the ECA (Efficient Channel Attention Module) attention mechanism of multiscale atrous convolution to improve the effect of tongue image segmentation.

Methods: In this paper, we improve the backbone network based on DeepLabV3 by incorporating the transformer structure and an improved residual structure. The residual module is divided into two structures and uses different residual structures under different conditions to speed up the frequency of shallow information mapping to deep network, which can more effectively extract the underlying features of tongue image; introduces ECA attention mechanism after concat operation in ASPP (Atrous Spatial Pyramid Pooling) structure to strengthen information interaction and fusion, effectively extract local and global features, and enable the model to focus more on difficult-to-separate areas such as tongue edge, to obtain better segmentation effect.

Results: The RTC_TongueNet network model was compared with FCN (Fully Convolutional Networks), UNet, LRASPP (Lite Reduced ASPP), and DeepLabV3 models on two datasets. On the two datasets, the MIOU (Mean Intersection over Union) and MPA (Mean Pixel Accuracy) values of the classic model DeepLabV3 were higher than those of FCN, UNet, and LRASPP models, and the performance was better. Compared with the DeepLabV3 model, the RTC_TongueNet network model increased MIOU value by 0.9% and MPA value by 0.3% on the first dataset; MIOU increased by 1.0% and MPA increased by 1.1% on the second dataset. RTC_TongueNet model performed best on both datasets.

Conclusion: In this study, based on DeepLabV3, we apply the improved residual structure and transformer as a backbone to fully extract image features locally and globally. The ECA attention module is combined to enhance channel attention, strengthen useful information, and weaken the interference of useless information. RTC_TongueNet model can effectively segment tongue images. This study has practical application value and reference value for tongue image segmentation.

Keywords: DeepLabV3; Tongue segmentation; attention mechanism; improved residual structure; transformer.