CervixFormer: A Multi-scale swin transformer-Based cervical pap-Smear WSI classification framework

Comput Methods Programs Biomed. 2023 Oct:240:107718. doi: 10.1016/j.cmpb.2023.107718. Epub 2023 Jul 10.

Abstract

Background and objectives: Cervical cancer affects around 0.5 million women per year, resulting in over 0.3 million fatalities. Therefore, repetitive screening for cervical cancer is of utmost importance. Computer-assisted diagnosis is key for scaling up cervical cancer screening. Current recognition algorithms, however, perform poorly on the whole-slide image (WSI) analysis, fail to generalize for different staining methods and on uneven distribution for subtype imaging, and provide sub-optimal clinical-level interpretations. Herein, we developed CervixFormer-an end-to-end, multi-scale swin transformer-based adversarial ensemble learning framework to assess pre-cancerous and cancer-specific cervical malignant lesions on WSIs.

Methods: The proposed framework consists of (1) a self-attention generative adversarial network (SAGAN) for generating synthetic images during patch-level training to address the class imbalanced problems; (2) a multi-scale transformer-based ensemble learning method for cell identification at various stages, including atypical squamous cells (ASC) and atypical squamous cells of undetermined significance (ASCUS), which have not been demonstrated in previous studies; and (3) a fusion model for concatenating ensemble-based results and producing final outcomes.

Results: In the evaluation, the proposed method is first evaluated on a private dataset of 717 annotated samples from six classes, obtaining a high recall and precision of 0.940 and 0.934, respectively, in roughly 1.2 minutes. To further examine the generalizability of CervixFormer, we evaluated it on four independent, publicly available datasets, namely, the CRIC cervix, Mendeley LBC, SIPaKMeD Pap Smear, and Cervix93 Extended Depth of Field image datasets. CervixFormer obtained a fairly better performance on two-, three-, four-, and six-class classification of smear- and cell-level datasets. For clinical interpretation, we used GradCAM to visualize a coarse localization map, highlighting important regions in the WSI. Notably, CervixFormer extracts feature mostly from the cell nucleus and partially from the cytoplasm.

Conclusions: In comparison with the existing state-of-the-art benchmark methods, the CervixFormer outperforms them in terms of recall, accuracy, and computing time.

Keywords: Cervical cancer; Image classification; Medical data augmentation; Swin transformer; WSI Analysis.

MeSH terms

  • Cervix Uteri / diagnostic imaging
  • Cervix Uteri / pathology
  • Diagnosis, Computer-Assisted
  • Early Detection of Cancer / methods
  • Female
  • Humans
  • Papanicolaou Test*
  • Uterine Cervical Neoplasms* / diagnostic imaging