LSAM: L2-norm self-attention and latent space feature interaction for automatic 3D multi-modal head and neck tumor segmentation

Phys Med Biol. 2023 Nov 6;68(22). doi: 10.1088/1361-6560/ad04a8.

Abstract

Objective.Head and neck (H&N) cancers are prevalent globally, and early and accurate detection is absolutely crucial for timely and effective treatment. However, the segmentation of H&N tumors is challenging due to the similar density of the tumors and surrounding tissues in CT images. While positron emission computed tomography (PET) images provide information about the metabolic activity of the tissue and can distinguish between lesion regions and normal tissue. But they are limited by their low spatial resolution. To fully leverage the complementary information from PET and CT images, we propose a novel and innovative multi-modal tumor segmentation method specifically designed for H&N tumor segmentation.Approach.The proposed novel and innovative multi-modal tumor segmentation network (LSAM) consists of two key learning modules, namely L2-Norm self-attention and latent space feature interaction, which exploit the high sensitivity of PET images and the anatomical information of CT images. These two advanced modules contribute to a powerful 3D segmentation network based on a U-shaped structure. The well-designed segmentation method can integrate complementary features from different modalities at multiple scales, thereby improving the feature interaction between modalities.Main results.We evaluated the proposed method on the public HECKTOR PET-CT dataset, and the experimental results demonstrate that the proposed method convincingly outperforms existing H&N tumor segmentation methods in terms of key evaluation metrics, including DSC (0.8457), Jaccard (0.7756), RVD (0.0938), and HD95 (11.75).Significance.The innovative Self-Attention mechanism based on L2-Norm offers scalability and is effective in reducing the impact of outliers on the performance of the model. And the novel method for multi-scale feature interaction based on Latent Space utilizes the learning process in the encoder phase to achieve the best complementary effects among different modalities.

Keywords: head and neck tumor; latent space; multi-modal segmentation; self-attention.

MeSH terms

  • Benchmarking
  • Head and Neck Neoplasms* / diagnostic imaging
  • Humans
  • Image Processing, Computer-Assisted
  • Positron Emission Tomography Computed Tomography*
  • Positron-Emission Tomography