Accurate object localization facilitates automatic esophagus segmentation in deep learning

Radiat Oncol. 2024 May 12;19(1):55. doi: 10.1186/s13014-024-02448-z.

Abstract

Background: Currently, automatic esophagus segmentation remains a challenging task due to its small size, low contrast, and large shape variation. We aimed to improve the performance of esophagus segmentation in deep learning by applying a strategy that involves locating the object first and then performing the segmentation task.

Methods: A total of 100 cases with thoracic computed tomography scans from two publicly available datasets were used in this study. A modified CenterNet, an object location network, was employed to locate the center of the esophagus for each slice. Subsequently, the 3D U-net and 2D U-net_coarse models were trained to segment the esophagus based on the predicted object center. A 2D U-net_fine model was trained based on the updated object center according to the 3D U-net model. The dice similarity coefficient and the 95% Hausdorff distance were used as quantitative evaluation indexes for the delineation performance. The characteristics of the automatically delineated esophageal contours by the 2D U-net and 3D U-net models were summarized. Additionally, the impact of the accuracy of object localization on the delineation performance was analyzed. Finally, the delineation performance in different segments of the esophagus was also summarized.

Results: The mean dice coefficient of the 3D U-net, 2D U-net_coarse, and 2D U-net_fine models were 0.77, 0.81, and 0.82, respectively. The 95% Hausdorff distance for the above models was 6.55, 3.57, and 3.76, respectively. Compared with the 2D U-net, the 3D U-net has a lower incidence of delineating wrong objects and a higher incidence of missing objects. After using the fine object center, the average dice coefficient was improved by 5.5% in the cases with a dice coefficient less than 0.75, while that value was only 0.3% in the cases with a dice coefficient greater than 0.75. The dice coefficients were lower for the esophagus between the orifice of the inferior and the pulmonary bifurcation compared with the other regions.

Conclusion: The 3D U-net model tended to delineate fewer incorrect objects but also miss more objects. Two-stage strategy with accurate object location could enhance the robustness of the segmentation model and significantly improve the esophageal delineation performance, especially for cases with poor delineation results.

Keywords: Automatic segmentation; Deep learning; Esophagus; Object localization.

MeSH terms

  • Deep Learning*
  • Esophagus* / diagnostic imaging
  • Humans
  • Image Processing, Computer-Assisted / methods
  • Imaging, Three-Dimensional / methods
  • Tomography, X-Ray Computed / methods