Diagnostic performance of deep learning in ultrasound diagnosis of breast cancer: a systematic review

Qing Dan; Ziting Xu; Hannah Burrows; Jennifer Bissram; Jeffrey S A Stringer; Yingjia Li

doi:10.1038/s41698-024-00514-z

Diagnostic performance of deep learning in ultrasound diagnosis of breast cancer: a systematic review

NPJ Precis Oncol. 2024 Jan 27;8(1):21. doi: 10.1038/s41698-024-00514-z.

Authors

Qing Dan^#^{1

2}, Ziting Xu^#¹, Hannah Burrows³, Jennifer Bissram³, Jeffrey S A Stringer⁴, Yingjia Li⁵

Affiliations

¹ Department of Ultrasound, Nanfang Hospital, Southern Medical University, 510515, Guangzhou, China.
² Global Women's Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
³ Health Sciences Library, The University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
⁴ Global Women's Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA. jeffrey_stringer@med.unc.edu.
⁵ Department of Ultrasound, Nanfang Hospital, Southern Medical University, 510515, Guangzhou, China. lyjia@smu.edu.cn.

^# Contributed equally.

Abstract

Deep learning (DL) has been widely investigated in breast ultrasound (US) for distinguishing between benign and malignant breast masses. This systematic review of test diagnosis aims to examine the accuracy of DL, compared to human readers, for the diagnosis of breast cancer in the US under clinical settings. Our literature search included records from databases including PubMed, Embase, Scopus, and Cochrane Library. Test accuracy outcomes were synthesized to compare the diagnostic performance of DL and human readers as well as to evaluate the assistive role of DL to human readers. A total of 16 studies involving 9238 female participants were included. There were no prospective studies comparing the test accuracy of DL versus human readers in clinical workflows. Diagnostic test results varied across the included studies. In 14 studies employing standalone DL systems, DL showed significantly lower sensitivities in 5 studies with comparable specificities and outperformed human readers at higher specificities in another 4 studies; in the remaining studies, DL models and human readers showed equivalent test outcomes. In 12 studies that assessed assistive DL systems, no studies proved the assistive role of DL in the overall diagnostic performance of human readers. Current evidence is insufficient to conclude that DL outperforms human readers or enhances the accuracy of diagnostic breast US in a clinical setting. Standardization of study methodologies is required to improve the reproducibility and generalizability of DL research, which will aid in clinical translation and application.

Publication types

Review