Weakly supervised segmentation models as explainable radiological classifiers for lung tumour detection on CT images

Insights Imaging. 2023 Nov 19;14(1):195. doi: 10.1186/s13244-023-01542-2.

Abstract

Purpose: Interpretability is essential for reliable convolutional neural network (CNN) image classifiers in radiological applications. We describe a weakly supervised segmentation model that learns to delineate the target object, trained with only image-level labels ("image contains object" or "image does not contain object"), presenting a different approach towards explainable object detectors for radiological imaging tasks.

Methods: A weakly supervised Unet architecture (WSUnet) was trained to learn lung tumour segmentation from image-level labelled data. WSUnet generates voxel probability maps with a Unet and then constructs an image-level prediction by global max-pooling, thereby facilitating image-level training. WSUnet's voxel-level predictions were compared to traditional model interpretation techniques (class activation mapping, integrated gradients and occlusion sensitivity) in CT data from three institutions (training/validation: n = 412; testing: n = 142). Methods were compared using voxel-level discrimination metrics and clinical value was assessed with a clinician preference survey on data from external institutions.

Results: Despite the absence of voxel-level labels in training, WSUnet's voxel-level predictions localised tumours precisely in both validation (precision: 0.77, 95% CI: [0.76-0.80]; dice: 0.43, 95% CI: [0.39-0.46]), and external testing (precision: 0.78, 95% CI: [0.76-0.81]; dice: 0.33, 95% CI: [0.32-0.35]). WSUnet's voxel-level discrimination outperformed the best comparator in validation (area under precision recall curve (AUPR): 0.55, 95% CI: [0.49-0.56] vs. 0.23, 95% CI: [0.21-0.25]) and testing (AUPR: 0.40, 95% CI: [0.38-0.41] vs. 0.36, 95% CI: [0.34-0.37]). Clinicians preferred WSUnet predictions in most instances (clinician preference rate: 0.72 95% CI: [0.68-0.77]).

Conclusion: Weakly supervised segmentation is a viable approach by which explainable object detection models may be developed for medical imaging.

Critical relevance statement: WSUnet learns to segment images at voxel level, training only with image-level labels. A Unet backbone first generates a voxel-level probability map and then extracts the maximum voxel prediction as the image-level prediction. Thus, training uses only image-level annotations, reducing human workload. WSUnet's voxel-level predictions provide a causally verifiable explanation for its image-level prediction, improving interpretability.

Key points: • Explainability and interpretability are essential for reliable medical image classifiers. • This study applies weakly supervised segmentation to generate explainable image classifiers. • The weakly supervised Unet inherently explains its image-level predictions at voxel level.

Keywords: Explainable artificial intelligence; Lung neoplasms; Model interpretation; Tumour segmentation; Weakly supervised learning.