Implications for downstream workload based on calibrating an artificial intelligence detection algorithm by standalone-reader or combined-reader sensitivity matching

Karin Dembrower; Mattie Salim; Martin Eklund; Peter Lindholm; Fredrik Strand

doi:10.1117/1.JMI.10.S2.S22405

Implications for downstream workload based on calibrating an artificial intelligence detection algorithm by standalone-reader or combined-reader sensitivity matching

J Med Imaging (Bellingham). 2023 Feb;10(Suppl 2):S22405. doi: 10.1117/1.JMI.10.S2.S22405. Epub 2023 Apr 5.

Authors

Karin Dembrower¹, Mattie Salim², Martin Eklund³, Peter Lindholm⁴, Fredrik Strand²

Affiliations

¹ Capio S:t Göran Hospital, Department of Radiology, Stockholm, Sweden.
² Karolinska Institutet, Department of Oncology and Pathology, Solna, Sweden.
³ Karolinska Institutet, Department of Medical Epidemiology and Biostatistics, Solna, Sweden.
⁴ Karolinska Institutet, Department of Physiology and Pharmacology, Solna, Sweden.

Abstract

Purpose: In double-reading of screening mammograms, artificial intelligence (AI) algorithms hold promise as a potential replacement for one of the two readers. The choice of operating point, or abnormality threshold, for the AI algorithm will affect cancer detection and workload. In our retrospective study, the baseline approach was based on matching stand-alone reader sensitivity, while the alternative approach was based on matching the combined-reader sensitivity of two humans and of AI plus human.

Approach: Full-field digital screening mammograms within the Stockholm County area between February 1, 2012, and December 30, 2015, acquired on Philips equipment, were collected. All exams of women with breast cancer within 23 months of screening and a random selection of healthy controls were included. An exam-level continuous AI abnormality score was generated (Insight MMG from Lunit Inc). Sensitivity and abnormal interpretation rates were estimated for operating points defined by the standalone-reader approach and the combined-reader approach.

Results: The study population included 1684 exams of women with breast cancer and 5024 exams of healthy women. Observations of healthy women were up-sampled to attain a realistic proportion of cancer. The observed sensitivity for reader 1, 2 and 1+2 was 69.7%, 75.6%, and 78.6%, respectively, at an abnormal interpretation rate of 4.4%, 4.6%, and 6.1%, respectively. For the combination of reader 1 + AI we estimated a sensitivity of 82.4% for standalone-reader matching and 78.6% for combined-reader matching, at an abnormal interpretation rate of 12.6% and 7.0%, respectively.

Conclusions: Setting the operating point by matching stand-alone AI stand-alone with a radiologist will nearly double the downstream workload compared to a modest increase of 15% for the alternative method of matching sensitivity between the combination of AI and a radiologist with two radiologists.

Keywords: artificial intelligence; breast cancer screening; calibration; computer-aided detection; implementation.