SMiT: symmetric mask transformer for disease severity detection

J Cancer Res Clin Oncol. 2023 Nov;149(17):16075-16086. doi: 10.1007/s00432-023-05223-x. Epub 2023 Sep 12.

Abstract

Purpose: The application of deep learning methods to the intelligent diagnosis of diseases has been the focus of intelligent medical research. When dealing with image classification tasks, if the lesion area is small and uneven, the background image involved in the training will affect the ultimate accuracy in determining the extent of the lesion. We did not follow the traditional approach of building an intelligent system to assist physicians in diagnosis from the perspective of CNN models, but instead proposed a pure transformer framework that can be used for diagnostic grading of pathological images.

Methods: We propose a Symmetric Mask Pre-Training vision Transformer SMiT model for grading pathological cancer images. SMiT performs a symmetrically identical high probability sparsification of the input image token sequence at the first and last encoder layer positions to pre-train visual transformers, and the parameters of the baseline model are fine-tuned after loading the pre-training weights, allowing the model to concentrate more on extracting detailed features in the lesion region, effectively getting rid of the potential feature dependency problem.

Results: SMiT achieved 92.8% classification accuracy on 4500 histopathological images of colorectal cancer processed by Gaussian filter denoising. We validated the effectiveness and generalizability of this study's methodology on the publicly available diabetic retinopathy dataset APTOS2019 from Kaggle and achieved quadratic Cohen Kappa, accuracy and F1-score of 91.9%, 86.91% and 72.85%, respectively, which were 1-2% better than previous studies based on CNN models.

Conclusion: SMiT uses a simpler strategy to achieve better results to assist physicians in making accurate clinical decisions, which can be an inspiration for making good use of the visual transformers in the field of medical imaging.

Keywords: Colorectal cancer; Deep learning; Diabetic retinopathy; Intelligent diagnosis; Symmetric mask; Visual transformer.

MeSH terms

  • Biomedical Research*
  • Decision Making
  • Humans
  • Patient Acuity
  • Physicians*