Evaluation of Semiautomatic and Deep Learning-Based Fully Automatic Segmentation Methods on [18F]FDG PET/CT Images from Patients with Lymphoma: Influence on Tumor Characterization

J Digit Imaging. 2023 Aug;36(4):1864-1876. doi: 10.1007/s10278-023-00823-y. Epub 2023 Apr 14.

Abstract

The objective is to assess the performance of seven semiautomatic and two fully automatic segmentation methods on [18F]FDG PET/CT lymphoma images and evaluate their influence on tumor quantification. All lymphoma lesions identified in 65 whole-body [18F]FDG PET/CT staging images were segmented by two experienced observers using manual and semiautomatic methods. Semiautomatic segmentation using absolute and relative thresholds, k-means and Bayesian clustering, and a self-adaptive configuration (SAC) of k-means and Bayesian was applied. Three state-of-the-art deep learning-based segmentations methods using a 3D U-Net architecture were also applied. One was semiautomatic and two were fully automatic, of which one is publicly available. Dice coefficient (DC) measured segmentation overlap, considering manual segmentation the ground truth. Lymphoma lesions were characterized by 31 features. Intraclass correlation coefficient (ICC) assessed features agreement between different segmentation methods. Nine hundred twenty [18F]FDG-avid lesions were identified. The SAC Bayesian method achieved the highest median intra-observer DC (0.87). Inter-observers' DC was higher for SAC Bayesian than manual segmentation (0.94 vs 0.84, p < 0.001). Semiautomatic deep learning-based median DC was promising (0.83 (Obs1), 0.79 (Obs2)). Threshold-based methods and publicly available 3D U-Net gave poorer results (0.56 ≤ DC ≤ 0.68). Maximum, mean, and peak standardized uptake values, metabolic tumor volume, and total lesion glycolysis showed excellent agreement (ICC ≥ 0.92) between manual and SAC Bayesian segmentation methods. The SAC Bayesian classifier is more reproducible and produces similar lesion features compared to manual segmentation, giving the best concordant results of all other methods. Deep learning-based segmentation can achieve overall good segmentation results but failed in few patients impacting patients' clinical evaluation.

Keywords: Artificial intelligence; Computer-assisted image analysis; Lymphoma; Reproducibility of results; [18F]FDG PET/CT.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Deep Learning*
  • Fluorodeoxyglucose F18 / metabolism
  • Humans
  • Lymphoma* / diagnostic imaging
  • Neoplasms*
  • Positron Emission Tomography Computed Tomography / methods

Substances

  • Fluorodeoxyglucose F18