Self-supervised learning-based cervical cytology for the triage of HPV-positive women in resource-limited settings and low-data regime

Comput Biol Med. 2024 Feb:169:107809. doi: 10.1016/j.compbiomed.2023.107809. Epub 2023 Dec 6.

Abstract

Screening Papanicolaou test samples has proven to be highly effective in reducing cervical cancer-related mortality. However, the lack of trained cytopathologists hinders its widespread implementation in low-resource settings. Deep learning-assisted telecytology diagnosis emerges as an appealing alternative, but it requires the collection of large annotated training datasets, which is costly and time-consuming. In this paper, we demonstrate that the abundance of unlabeled images that can be extracted from Pap smear test whole slide images presents a fertile ground for self-supervised learning methods, yielding performance improvements compared to off-the-shelf pre-trained models for various downstream tasks. In particular, we propose Cervical Cell Copy-Pasting (C3P) as an effective augmentation method, which enables knowledge transfer from public and labeled single-cell datasets to unlabeled tiles. Not only does C3P outperforms naive transfer from single-cell images, but we also demonstrate its advantageous integration into multiple instance learning methods. Importantly, all our experiments are conducted on our introduced in-house dataset comprising liquid-based cytology Pap smear images obtained using low-cost technologies. This aligns with our long-term objective of deep learning-assisted telecytology for diagnosis in low-resource settings.

Keywords: Digital cytology; Pasting augmentation; Self-supervised learning; WSIs classification.

MeSH terms

  • Cytology
  • Female
  • Humans
  • Papillomavirus Infections* / diagnosis
  • Resource-Limited Settings
  • Supervised Machine Learning
  • Triage
  • Uterine Cervical Neoplasms* / diagnosis