What Does Deep Learning See? Insights From a Classifier Trained to Predict Contrast Enhancement Phase From CT Images

Kenneth A Philbrick; Kotaro Yoshida; Dai Inoue; Zeynettin Akkus; Timothy L Kline; Alexander D Weston; Panagiotis Korfiatis; Naoki Takahashi; Bradley J Erickson

doi:10.2214/AJR.18.20331

What Does Deep Learning See? Insights From a Classifier Trained to Predict Contrast Enhancement Phase From CT Images

AJR Am J Roentgenol. 2018 Dec;211(6):1184-1193. doi: 10.2214/AJR.18.20331. Epub 2018 Nov 7.

Authors

Kenneth A Philbrick¹, Kotaro Yoshida¹, Dai Inoue¹, Zeynettin Akkus¹, Timothy L Kline¹, Alexander D Weston¹, Panagiotis Korfiatis¹, Naoki Takahashi¹, Bradley J Erickson¹

Affiliation

¹ 1 Department of Radiology, Radiology Informatics Laboratory, Mayo Clinic, 3507 17th Ave NW, Rochester, MN 55901.

PMID: 30403527
DOI: 10.2214/AJR.18.20331

Abstract

Objective: Deep learning has shown great promise for improving medical image classification tasks. However, knowing what aspects of an image the deep learning system uses or, in a manner of speaking, sees to make its prediction is difficult.

Materials and methods: Within a radiologic imaging context, we investigated the utility of methods designed to identify features within images on which deep learning activates. In this study, we developed a classifier to identify contrast enhancement phase from whole-slice CT data. We then used this classifier as an easily interpretable system to explore the utility of class activation map (CAMs), gradient-weighted class activation maps (Grad-CAMs), saliency maps, guided backpropagation maps, and the saliency activation map, a novel map reported here, to identify image features the model used when performing prediction.

Results: All techniques identified voxels within imaging that the classifier used. SAMs had greater specificity than did guided backpropagation maps, CAMs, and Grad-CAMs at identifying voxels within imaging that the model used to perform prediction. At shallow network layers, SAMs had greater specificity than Grad-CAMs at identifying input voxels that the layers within the model used to perform prediction.

Conclusion: As a whole, voxel-level visualizations and visualizations of the imaging features that activate shallow network layers are powerful techniques to identify features that deep learning models use when performing prediction.

Keywords: CT; class activation map (CAM); computer-aided diagnosis; contrast enhancement phase; convolutional neural network (CNN); deep learning; gradient-weighted class activation map (Grad-CAM); guided backpropagation; machine learning; saliency activation map; saliency map.

MeSH terms

Algorithms
Deep Learning*
Humans
Image Processing, Computer-Assisted*
Sensitivity and Specificity
Tomography, X-Ray Computed*