Self-supervised enhanced thyroid nodule detection in ultrasound examination video sequences with multi-perspective evaluation

Phys Med Biol. 2023 Nov 28;68(23). doi: 10.1088/1361-6560/ad092a.

Abstract

Objective.Ultrasound is the most commonly used examination for the detection and identification of thyroid nodules. Since manual detection is time-consuming and subjective, attempts to introduce machine learning into this process are ongoing. However, the performance of these methods is limited by the low signal-to-noise ratio and tissue contrast of ultrasound images. To address these challenges, we extend thyroid nodule detection from image-based to video-based using the temporal context information in ultrasound videos.Approach.We propose a video-based deep learning model with adjacent frame perception (AFP) for accurate and real-time thyroid nodule detection. Compared to image-based methods, AFP can aggregate semantically similar contextual features in the video. Furthermore, considering the cost of medical image annotation for video-based models, a patch scale self-supervised model (PASS) is proposed. PASS is trained on unlabeled datasets to improve the performance of the AFP model without additional labelling costs.Main results.The PASS model is trained by 92 videos containing 23 773 frames, of which 60 annotated videos containing 16 694 frames were used to train and evaluate the AFP model. The evaluation is performed from the video, frame, nodule, and localization perspectives. In the evaluation of the localization perspective, we used the average precision metric with the intersection-over-union threshold set to 50% (AP@50), which is the area under the smoothed Precision-Recall curve. Our proposed AFP improved AP@50 from 0.256 to 0.390, while the PASS-enhanced AFP further improved the AP@50 to 0.425. AFP and PASS also improve the performance in the valuations of other perspectives based on the localization results.Significance.Our video-based model can mitigate the effects of low signal-to-noise ratio and tissue contrast in ultrasound images and enable the accurate detection of thyroid nodules in real-time. The evaluation from multiple perspectives of the ablation experiments demonstrates the effectiveness of our proposed AFP and PASS models.

Keywords: deep learning; self-supervised pre-training; thyroid nodule; ultrasound video; video object detection.

MeSH terms

  • Humans
  • Machine Learning
  • Signal-To-Noise Ratio
  • Thyroid Nodule* / diagnostic imaging
  • Ultrasonography
  • alpha-Fetoproteins

Substances

  • alpha-Fetoproteins