Deep Ensembles Are Robust to Occasional Catastrophic Failures of Individual DNNs for Organs Segmentations in CT Images

Yury Petrov; Bilal Malik; Jill Fredrickson; Skander Jemaa; Richard A D Carano

doi:10.1007/s10278-023-00857-2

Deep Ensembles Are Robust to Occasional Catastrophic Failures of Individual DNNs for Organs Segmentations in CT Images

J Digit Imaging. 2023 Oct;36(5):2060-2074. doi: 10.1007/s10278-023-00857-2. Epub 2023 Jun 8.

Authors

Yury Petrov¹, Bilal Malik², Jill Fredrickson², Skander Jemaa², Richard A D Carano²

Affiliations

¹ Genentech, Inc., 1 DNA Way, South San Francisco, CA, 94080, USA. petrov.yury@gene.com.
² Genentech, Inc., 1 DNA Way, South San Francisco, CA, 94080, USA.

Abstract

Deep neural networks (DNNs) have recently showed remarkable performance in various computer vision tasks, including classification and segmentation of medical images. Deep ensembles (an aggregated prediction of multiple DNNs) were shown to improve a DNN's performance in various classification tasks. Here we explore how deep ensembles perform in the image segmentation task, in particular, organ segmentations in CT (Computed Tomography) images. Ensembles of V-Nets were trained to segment multiple organs using several in-house and publicly available clinical studies. The ensembles segmentations were tested on images from a different set of studies, and the effects of ensemble size as well as other ensemble parameters were explored for various organs. Compared to single models, Deep Ensembles significantly improved the average segmentation accuracy, especially for those organs where the accuracy was lower. More importantly, Deep Ensembles strongly reduced occasional "catastrophic" segmentation failures characteristic of single models and variability of the segmentation accuracy from image to image. To quantify this we defined the "high risk images": images for which at least one model produced an outlier metric (performed in the lower 5% percentile). These images comprised about 12% of the test images across all organs. Ensembles performed without outliers for 68%-100% of the "high risk images" depending on the performance metric used.

Keywords: Automated organ segmentation; Computed tomography; Deep ensembles; Deep neural networks.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Humans
Image Processing, Computer-Assisted* / methods
Neural Networks, Computer*
Tomography, X-Ray Computed / methods