FaxMatch: Multi-Curriculum Pseudo-Labeling for semi-supervised medical image classification

Med Phys. 2023 May;50(5):3210-3222. doi: 10.1002/mp.16312. Epub 2023 Feb 21.

Abstract

Background: Semi-supervised learning (SSL) can effectively use information from unlabeled data to improve model performance, which has great significance in medical imaging tasks. Pseudo-labeling is a classical SSL method that uses a model to predict unlabeled samples and selects the prediction with the highest confidence level as the pseudo-labels and then uses the generated pseudo-labels to train the model. Most of the current pseudo-label-based SSL algorithms use predefined fixed thresholds for all classes to select unlabeled data.

Purpose: However, data imbalance is a common problem in medical image tasks, where the use of fixed threshold to generate pseudo-labels ignores different classes of learning status and learning difficulties. The aim of this study is to develop an algorithm to solve this problem.

Methods: In this work, we propose Multi-Curriculum Pseudo-Labeling (MCPL), which evaluates the learning status of the model for each class at each epoch and automatically adjusts the thresholds for each class. We apply MCPL to FixMatch and propose a new SSL framework for medical image classification, which we call the improved algorithm FaxMatch. To mitigate the impact of incorrect pseudo-labels on the model, we use label smoothing (LS) strategy to generate soft labels (SL) for pseudo-labels.

Results: We have conducted extensive experiments to evaluate our method on two public benchmark medical image classification datasets: the ISIC 2018 skin lesion analysis and COVID-CT datasets. Experimental results show that our method outperforms fully supervised baseline, which uses only labeled data to train the model. Moreover, our method also outperforms other state-of-the-art methods.

Conclusions: We propose MCPL and construct a semi-supervised medical image classification framework to reduce the reliance of the model on a large number of labeled images and reduce the manual workload of labeling medical image data.

Keywords: curriculum learning; medical image classification; semi-supervised learning.

MeSH terms

  • Algorithms
  • Benchmarking
  • COVID-19*
  • Curriculum
  • Humans
  • Supervised Machine Learning

Substances

  • O-(N-morpholinocarbonyl)-3-phenyllactic acid