Combining the strengths of radiologists and AI for breast cancer screening: a retrospective analysis

Christian Leibig; Moritz Brehmer; Stefan Bunk; Danalyn Byng; Katja Pinker; Lale Umutlu

doi:10.1016/S2589-7500(22)00070-X

Combining the strengths of radiologists and AI for breast cancer screening: a retrospective analysis

Lancet Digit Health. 2022 Jul;4(7):e507-e519. doi: 10.1016/S2589-7500(22)00070-X.

Authors

Christian Leibig¹, Moritz Brehmer², Stefan Bunk³, Danalyn Byng³, Katja Pinker⁴, Lale Umutlu⁵

Affiliations

¹ Vara, Berlin, Germany. Electronic address: christian.leibig@vara.ai.
² Vara, Berlin, Germany; Department of Diagnostic and Interventional Radiology and Neuroradiology, University-Hospital Essen, Essen, Germany.
³ Vara, Berlin, Germany.
⁴ Department of Radiology, Breast Imaging Service, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Biomedical Imaging and Image-Guided Therapy Division of Molecular and Gender Imaging, Medical University of Vienna, Vienna, Austria.
⁵ Department of Diagnostic and Interventional Radiology and Neuroradiology, University-Hospital Essen, Essen, Germany.

Abstract

Background: We propose a decision-referral approach for integrating artificial intelligence (AI) into the breast-cancer screening pathway, whereby the algorithm makes predictions on the basis of its quantification of uncertainty. Algorithmic assessments with high certainty are done automatically, whereas assessments with lower certainty are referred to the radiologist. This two-part AI system can triage normal mammography exams and provide post-hoc cancer detection to maintain a high degree of sensitivity. This study aimed to evaluate the performance of this AI system on sensitivity and specificity when used either as a standalone system or within a decision-referral approach, compared with the original radiologist decision.

Methods: We used a retrospective dataset consisting of 1 193 197 full-field, digital mammography studies carried out between Jan 1, 2007, and Dec 31, 2020, from eight screening sites participating in the German national breast-cancer screening programme. We derived an internal-test dataset from six screening sites (1670 screen-detected cancers and 19 997 normal mammography exams), and an external-test dataset of breast cancer screening exams (2793 screen-detected cancers and 80 058 normal exams) from two additional screening sites to evaluate the performance of an AI algorithm on sensitivity and specificity when used either as a standalone system or within a decision-referral approach, compared with the original individual radiologist decision at the point-of-screen reading ahead of the consensus conference. Different configurations of the AI algorithm were evaluated. To account for the enrichment of the datasets caused by oversampling cancer cases, weights were applied to reflect the actual distribution of study types in the screening programme. Triaging performance was evaluated as the rate of exams correctly identified as normal. Sensitivity across clinically relevant subgroups, screening sites, and device manufacturers was compared between standalone AI, the radiologist, and decision referral. We present receiver operating characteristic (ROC) curves and area under the ROC (AUROC) to evaluate AI-system performance over its entire operating range. Comparison with radiologists and subgroup analysis was based on sensitivity and specificity at clinically relevant configurations.

Findings: The exemplary configuration of the AI system in standalone mode achieved a sensitivity of 84·2% (95% CI 82·4-85·8) and a specificity of 89·5% (89·0-89·9) on internal-test data, and a sensitivity of 84·6% (83·3-85·9) and a specificity of 91·3% (91·1-91·5) on external-test data, but was less accurate than the average unaided radiologist. By contrast, the simulated decision-referral approach significantly improved upon radiologist sensitivity by 2·6 percentage points and specificity by 1·0 percentage points, corresponding to a triaging performance at 63·0% on the external dataset; the AUROC was 0·982 (95% CI 0·978-0·986) on the subset of studies assessed by AI, surpassing radiologist performance. The decision-referral approach also yielded significant increases in sensitivity for a number of clinically relevant subgroups, including subgroups of small lesion sizes and invasive carcinomas. Sensitivity of the decision-referral approach was consistent across the eight included screening sites and three device manufacturers.

Interpretation: The decision-referral approach leverages the strengths of both the radiologist and AI, demonstrating improvements in sensitivity and specificity surpassing that of the individual radiologist and of the standalone AI system. This approach has the potential to improve the screening accuracy of radiologists, is adaptive to the requirements of screening, and could allow for the reduction of workload ahead of the consensus conference, without discarding the generalised knowledge of radiologists.

Funding: Vara.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Artificial Intelligence
Breast Neoplasms* / diagnostic imaging
Early Detection of Cancer*
Female
Humans
Radiologists
Retrospective Studies

Grants and funding

P30 CA008748/CA/NCI NIH HHS/United States