Comparison of the Diagnostic Accuracy of Mammogram-based Deep Learning and Traditional Breast Cancer Risk Models in Patients Who Underwent Supplemental Screening with MRI

Leslie R Lamb; Sarah F Mercaldo; Kimeya Ghaderi; Andrew Carney; Constance D Lehman

doi:10.1148/radiol.223077

Comparison of the Diagnostic Accuracy of Mammogram-based Deep Learning and Traditional Breast Cancer Risk Models in Patients Who Underwent Supplemental Screening with MRI

Radiology. 2023 Sep;308(3):e223077. doi: 10.1148/radiol.223077.

Authors

Leslie R Lamb¹, Sarah F Mercaldo¹, Kimeya Ghaderi¹, Andrew Carney¹, Constance D Lehman¹

Affiliation

¹ From the Department of Radiology, Massachusetts General Hospital, 55 Fruit St, Boston, MA 02114-2696.

PMID: 37724967
DOI: 10.1148/radiol.223077

Abstract

Background Access to supplemental screening breast MRI is determined using traditional risk models, which are limited by modest predictive accuracy. Purpose To compare the diagnostic accuracy of a mammogram-based deep learning (DL) risk assessment model to that of traditional breast cancer risk models in patients who underwent supplemental screening with MRI. Materials and Methods This retrospective study included consecutive patients undergoing breast cancer screening MRI from September 2017 to September 2020 at four facilities. Risk was assessed using the Tyrer-Cuzick (TC) and National Cancer Institute Breast Cancer Risk Assessment Tool (BCRAT) 5-year and lifetime models as well as a DL 5-year model that generated a risk score based on the most recent screening mammogram. A risk score of 1.67% or higher defined increased risk for traditional 5-year models, a risk score of 20% or higher defined high risk for traditional lifetime models, and absolute scores of 2.3 or higher and 6.6 or higher defined increased and high risk, respectively, for the DL model. Model accuracy metrics including cancer detection rate (CDR) and positive predictive values (PPVs) (PPV of abnormal findings at screening [PPV1], PPV of biopsies recommended [PPV2], and PPV of biopsies performed [PPV3]) were compared using logistic regression models. Results This study included 2168 women who underwent 4247 high-risk screening MRI examinations (median age, 54 years [IQR, 48-60 years]). CDR (per 1000 examinations) was higher in patients at high risk according to the DL model (20.6 [95% CI: 11.8, 35.6]) than according to the TC (6.0 [95% CI: 2.9, 12.3]; P < .01) and BCRAT (6.8 [95% CI: 2.9, 15.8]; P = .04) lifetime models. PPV1, PPV2, and PPV3 were higher in patients identified as high risk by the DL model (PPV1, 14.6%; PPV2, 32.4%; PPV3, 36.4%) than those identified as high risk with the TC (PPV1, 5.0%; PPV2, 12.7%; PPV3, 13.5%; P value range, .02-.03) and BCRAT (PPV1, 5.5%; PPV2, 11.1%; PPV3, 12.5%; P value range, .02-.05) lifetime models. Conclusion Patients identified as high risk by a mammogram-based DL risk assessment model showed higher CDR at breast screening MRI than patients identified as high risk with traditional risk models. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Bae in this issue.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Breast Neoplasms* / diagnostic imaging
Deep Learning*
Early Detection of Cancer
Female
Humans
Magnetic Resonance Imaging
Middle Aged
Retrospective Studies