Mammography Breast Cancer Screening Triage Using Deep Learning: A UK Retrospective Study

Sarah E Hickman; Nicholas R Payne; Richard T Black; Yuan Huang; Andrew N Priest; Sue Hudson; Bahman Kasmai; Arne Juette; Muzna Nanaa; Muhammad Iqbal Aniq; Anna Sienko; Fiona J Gilbert

doi:10.1148/radiol.231173

Mammography Breast Cancer Screening Triage Using Deep Learning: A UK Retrospective Study

Radiology. 2023 Nov;309(2):e231173. doi: 10.1148/radiol.231173.

Affiliation

¹ From the Department of Radiology, University of Cambridge School of Clinical Medicine, Box 218, Level 5, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK (S.E.H., N.R.P., Y.H., A.N.P., M.N., F.J.G.); University of Cambridge School of Clinical Medicine, Cambridge, UK (M.I.A, A.S.); Department of Radiology, Barts Health NHS Trust, The Royal London Hospital, London, UK (S.E.H.); Department of Radiology, Addenbrooke's Hospital, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK (R.T.B., A.N.P., F.J.G.); EPSRC Cambridge Mathematics of Information in Healthcare Hub, University of Cambridge, Cambridge, UK (Y.H.); Peel & Schriek Consulting, London, UK (S.H.); Department of Radiology, Norfolk and Norwich University Hospital, Norwich, UK (B.K., A.J.); and University of East Anglia, Norwich Research Park, Norwich, UK (B.K.).

PMID: 37987665
DOI: 10.1148/radiol.231173

Abstract

Background Breast screening enables early detection of cancers; however, most women have normal mammograms, resulting in repetitive and resource-intensive reading tasks. Purpose To investigate if deep learning (DL) algorithms can be used to triage mammograms by identifying normal results to reduce workload or flag cancers that may be overlooked. Materials and Methods In this retrospective study, three commercial DL algorithms were investigated using consecutive mammograms from two UK Breast Screening Program sites from January 2015 to December 2017 and January 2017 to December 2018 on devices from two mammography vendors. Normal mammograms with a 3-year follow-up and histopathologically proven cancer detected at screening, the subsequent round, or in the 3-year interval were included. Two algorithm thresholds were set: in scenario A, 99.0% sensitivity for rule-out triage to a lone reader, and in scenario B, approximately 1.0% additional recall providing a rule-in triage for further assessment. Both thresholds were then applied to the screening workflow in scenario C. The sensitivity and specificity were used to assess the overall predictive performance of each DL algorithm. Results The data set comprised 78 849 patients (median age, 59 years [IQR, 53-63 years]) and 887 screening-detected, 439 interval, and 688 subsequent screening round-detected cancers. In scenario A (rule-out triage), models DL-1, DL-2, and DL-3 triaged 35.0% (27 565 of 78 849), 53.2% (41 937 of 78 849), and 55.6% (43 869 of 78 849) of mammograms, respectively, with 0.0% (0 of 887) to 0.1% (one of 887) of screening-detected cancers undetected. In scenario B, DL algorithms triaged in 4.6% (20 of 439) to 8.2% (36 of 439) of interval and 5.2% (36 of 688) to 6.1% (42 of 688) of subsequent-round cancers when applied after the routine double-reading workflow. Combining both approaches in scenario C resulted in an overall noninferior specificity (difference, -0.9%; P < .001) and superior sensitivity (difference, 2.7%; P < .001) for the adaptive workflow compared with routine double reading for all three algorithms. Conclusion Rule-out and rule-in DL-adapted triage workflows can improve the efficiency and efficacy of mammography breast cancer screening. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Nishikawa and Lu in this issue.

MeSH terms

Breast Neoplasms* / diagnostic imaging
Deep Learning*
Early Detection of Cancer
Female
Humans
Mammography
Middle Aged
Retrospective Studies
Triage
United Kingdom