Artificial Intelligence with Statistical Confidence Scores for Detection of Acute or Subacute Hemorrhage on Noncontrast CT Head Scans

Radiol Artif Intell. 2022 Apr 20;4(3):e210115. doi: 10.1148/ryai.210115. eCollection 2022 May.

Abstract

Purpose: To present a method that automatically detects, subtypes, and locates acute or subacute intracranial hemorrhage (ICH) on noncontrast CT (NCCT) head scans; generates detection confidence scores to identify high-confidence data subsets with higher accuracy; and improves radiology worklist prioritization. Such scores may enable clinicians to better use artificial intelligence (AI) tools.

Materials and methods: This retrospective study included 46 057 studies from seven "internal" centers for development (training, architecture selection, hyperparameter tuning, and operating-point calibration; n = 25 946) and evaluation (n = 2947) and three "external" centers for calibration (n = 400) and evaluation (n = 16 764). Internal centers contributed developmental data, whereas external centers did not. Deep neural networks predicted the presence of ICH and subtypes (intraparenchymal, intraventricular, subarachnoid, subdural, and/or epidural hemorrhage) and segmentations per case. Two ICH confidence scores are discussed: a calibrated classifier entropy score and a Dempster-Shafer score. Evaluation was completed by using receiver operating characteristic curve analysis and report turnaround time (RTAT) modeling on the evaluation set and on confidence score-defined subsets using bootstrapping.

Results: The areas under the receiver operating characteristic curve for ICH were 0.97 (0.97, 0.98) and 0.95 (0.94, 0.95) on internal and external center data, respectively. On 80% of the data stratified by calibrated classifier and Dempster-Shafer scores, the system improved the Youden indexes, increasing them from 0.84 to 0.93 (calibrated classifier) and from 0.84 to 0.92 (Dempster-Shafer) for internal centers and increasing them from 0.78 to 0.88 (calibrated classifier) and from 0.78 to 0.89 (Dempster-Shafer) for external centers (P < .001). Models estimated shorter RTAT for AI-prioritized worklists with confidence measures than for AI-prioritized worklists without confidence measures, shortening RTAT by 27% (calibrated classifier) and 27% (Dempster-Shafer) for internal centers and shortening RTAT by 25% (calibrated classifier) and 27% (Dempster-Shafer) for external centers (P < .001).

Conclusion: AI that provided statistical confidence measures for ICH detection on NCCT scans reliably detected and subtyped hemorrhages, identified high-confidence predictions, and improved worklist prioritization in simulation.Keywords: CT, Head/Neck, Hemorrhage, Convolutional Neural Network (CNN) Supplemental material is available for this article. © RSNA, 2022.

Keywords: CT; Convolutional Neural Network (CNN); Head/Neck; Hemorrhage.