Evaluation of Semiautomatic and Deep Learning-Based Fully Automatic Segmentation Methods on [18F]FDG PET/CT Images from Patients with Lymphoma: Influence on Tumor Characterization

Cláudia S Constantino; Sónia Leocádio; Francisco P M Oliveira; Mariana Silva; Carla Oliveira; Joana C Castanheira; Ângelo Silva; Sofia Vaz; Ricardo Teixeira; Manuel Neves; Paulo Lúcio; Cristina João; Durval C Costa

doi:10.1007/s10278-023-00823-y

Evaluation of Semiautomatic and Deep Learning-Based Fully Automatic Segmentation Methods on [¹⁸F]FDG PET/CT Images from Patients with Lymphoma: Influence on Tumor Characterization

J Digit Imaging. 2023 Aug;36(4):1864-1876. doi: 10.1007/s10278-023-00823-y. Epub 2023 Apr 14.

Affiliations

¹ Nuclear Medicine - Radiopharmacology Department, Champalimaud Foundation, Av. Brasília, 1400-038, Lisbon, Portugal. claudia.constantino@research.fchampalimaud.org.
² Hematology Department, Champalimaud Foundation, Av. Brasília, 1400-038, Lisbon, Portugal.
³ Nuclear Medicine - Radiopharmacology Department, Champalimaud Foundation, Av. Brasília, 1400-038, Lisbon, Portugal.

Abstract

The objective is to assess the performance of seven semiautomatic and two fully automatic segmentation methods on [¹⁸F]FDG PET/CT lymphoma images and evaluate their influence on tumor quantification. All lymphoma lesions identified in 65 whole-body [¹⁸F]FDG PET/CT staging images were segmented by two experienced observers using manual and semiautomatic methods. Semiautomatic segmentation using absolute and relative thresholds, k-means and Bayesian clustering, and a self-adaptive configuration (SAC) of k-means and Bayesian was applied. Three state-of-the-art deep learning-based segmentations methods using a 3D U-Net architecture were also applied. One was semiautomatic and two were fully automatic, of which one is publicly available. Dice coefficient (DC) measured segmentation overlap, considering manual segmentation the ground truth. Lymphoma lesions were characterized by 31 features. Intraclass correlation coefficient (ICC) assessed features agreement between different segmentation methods. Nine hundred twenty [¹⁸F]FDG-avid lesions were identified. The SAC Bayesian method achieved the highest median intra-observer DC (0.87). Inter-observers' DC was higher for SAC Bayesian than manual segmentation (0.94 vs 0.84, p < 0.001). Semiautomatic deep learning-based median DC was promising (0.83 (Obs1), 0.79 (Obs2)). Threshold-based methods and publicly available 3D U-Net gave poorer results (0.56 ≤ DC ≤ 0.68). Maximum, mean, and peak standardized uptake values, metabolic tumor volume, and total lesion glycolysis showed excellent agreement (ICC ≥ 0.92) between manual and SAC Bayesian segmentation methods. The SAC Bayesian classifier is more reproducible and produces similar lesion features compared to manual segmentation, giving the best concordant results of all other methods. Deep learning-based segmentation can achieve overall good segmentation results but failed in few patients impacting patients' clinical evaluation.

Keywords: Artificial intelligence; Computer-assisted image analysis; Lymphoma; Reproducibility of results; [18F]FDG PET/CT.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Bayes Theorem
Deep Learning*
Fluorodeoxyglucose F18 / metabolism
Humans
Lymphoma* / diagnostic imaging
Neoplasms*
Positron Emission Tomography Computed Tomography / methods

Substances

Fluorodeoxyglucose F18