The uncertainty in predictions from deep neural network analysis of medical imaging is challenging to assess but potentially important to include in subsequent decision-making. Using data from diabetic retinopathy detection, we present an empirical evaluation of the role of model calibration in uncertainty-based referral, an approach that prioritizes referral of observations based on the magnitude of a measure of uncertainty. We consider several configurations of network architecture, methods for uncertainty estimation, and training data size. We identify a strong relationship between the effectiveness of uncertainty-based referral and having a well-calibrated model. This is especially relevant as complex deep neural networks tend to have high calibration errors. Finally, we show that post-calibration of the neural network helps uncertainty-based referral with identifying hard-to-classify observations.
Keywords: Deep learning; calibration; diabetic retinopathy; domain adaptation; post-calibration; uncertainty estimation.