Echocardiographic image multi-structure segmentation using Cardiac-SegNet

Med Phys. 2021 May;48(5):2426-2437. doi: 10.1002/mp.14818. Epub 2021 Apr 1.

Abstract

Purpose: Cardiac boundary segmentation of echocardiographic images is important for cardiac function assessment and disease diagnosis. However, it is challenging to segment cardiac ventricles due to the low contrast-to-noise ratio and speckle noise of the echocardiographic images. Manual segmentation is subject to interobserver variability and is too slow for real-time image-guided interventions. We aim to develop a deep learning-based method for automated multi-structure segmentation of echocardiographic images.

Methods: We developed an anchor-free mask convolutional neural network (CNN), termed Cardiac-SegNet, which consists of three subnetworks, that is, a backbone, a fully convolutional one-state object detector (FCOS) head, and a mask head. The backbone extracts multi-level and multi-scale features from endocardium image. The FOCS head utilizes these features to detect and label the region-of-interests (ROIs) of the segmentation targets. Unlike the traditional mask regional CNN (Mask R-CNN) method, the FCOS head is anchor-free and can model the spatial relationship of the targets. The mask head utilizes a spatial attention strategy, which allows the network to highlight salient features to perform segmentation on each detected ROI. For evaluation, we investigated 450 patient datasets by a five-fold cross-validation and a hold-out test. The endocardium (LVEndo ) and epicardium (LVEpi ) of the left ventricle and left atrium (LA) were segmented and compared with manual contours using the Dice similarity coefficient (DSC), Hausdorff distance (HD), mean absolute distance (MAD), and center-of-mass distance (CMD).

Results: Compared to U-Net and Mask R-CNN, our method achieved higher segmentation accuracy and fewer erroneous speckles. When our method was evaluated on a separate hold-out dataset at the end diastole (ED) and the end systole (ES) phases, the average DSC were 0.952 and 0.939 at ED and ES for the LVEndo , 0.965 and 0.959 at ED and ES for the LVEpi , and 0.924 and 0.926 at ED and ES for the LA. For patients with a typical image size of 549 × 788 pixels, the proposed method can perform the segmentation within 0.5 s.

Conclusion: We proposed a fast and accurate method to segment echocardiographic images using an anchor-free mask CNN.

Keywords: CNN; cardiac; deep learning; segmentation; ultrasound.

MeSH terms

  • Echocardiography
  • Heart / diagnostic imaging
  • Heart Ventricles / diagnostic imaging
  • Humans
  • Image Processing, Computer-Assisted*
  • Neural Networks, Computer*