Transformer guided self-adaptive network for multi-scale skin lesion image segmentation

Comput Biol Med. 2024 Feb:169:107846. doi: 10.1016/j.compbiomed.2023.107846. Epub 2023 Dec 23.

Abstract

Background: In recent years, skin lesion has become a major public health concern, and the diagnosis and management of skin lesions depend heavily on the correct segmentation of the lesions. Traditional convolutional neural networks (CNNs) have demonstrated promising results in skin lesion segmentation, but they are limited in their ability to capture distant connections and intricate features. In addition, current medical image segmentation algorithms rarely consider the distribution of different categories in different regions of the image and do not consider the spatial relationship between pixels.

Objectives: This study proposes a self-adaptive position-aware skin lesion segmentation model SapFormer to capture global context and fine-grained detail, better capture spatial relationships, and adapt to different positional characteristics. The SapFormer is a multi-scale dynamic position-aware structure designed to provide a more flexible representation of the relationships between skin lesion characteristics and lesion distribution. Additionally, it increases skin lesion segmentation accuracy and decreases incorrect segmentation of non-lesion areas.

Innovations: SapFormer designs multiple hybrid transformers for multi-scale feature encoding of skin images and multi-scale positional feature sensing of the encoded features using a transformer decoder to obtain fine-grained features of the lesion area and optimize the regional feature distribution. The self-adaptive feature framework, built upon the transformer decoder module, dynamically and automatically generates parameterizations with learnable properties at different positions. These parameterizations are derived from the multi-scale encoding characteristics of the input image. Simultaneously, this paper utilizes the cross-attention network to optimize the features of the current region according to the features of other regions, aiming to increase skin lesion segmentation accuracy.

Main results: The ISIC-2016, ISIC-2017, and ISIC-2018 datasets for skin lesions are used as the basis for the experiment. On these datasets, the proposed model has accuracy values of 97.9 %, 94.3 %, and 95.7 %, respectively. The proposed model's IOU values are, in order, 93.2 %, 86.4 %, and 89.4 %. The proposed model's DSC values are 96.4 %, 92.6 %, and 94.3 %, respectively. All three metrics surpass the performance of the majority of state-of-the-art (SOTA) models. SapFormer's metrics on these datasets demonstrate that it can precisely segment skin lesions. Notably, our approach exhibits remarkable noise resistance in non-lesion areas, while simultaneously conducting finer-grained regional feature extraction on the skin lesion image.

Conclusions: In conclusion, the integration of a transformer-guided position-aware network into semantic skin lesion segmentation results in a notable performance boost. The ability of our proposed network to capture spatial relationships and fine-grained details proves beneficial for effective skin lesion segmentation. By enhancing lesion localization, feature extraction, quantitative analysis, and classification accuracy, the proposed segmentation model improves the diagnostic efficiency of skin lesion analysis on dermoscopic images. It assists dermatologists in making more accurate and efficient diagnoses, ultimately leading to better patient care and outcomes. This research paves the way for advances in diagnosing and treating skin lesions, promoting better understanding and decision-making in the clinical setting.

Keywords: Segmentation; Self-adaptive feature extraction; Skin lesion; Vision transformer.

MeSH terms

  • Algorithms
  • Benchmarking
  • Humans
  • Image Processing, Computer-Assisted
  • Neural Networks, Computer
  • Skin
  • Skin Diseases*