Ultrasound-based deep learning in the establishment of a breast lesion risk stratification system: a multicenter study

Eur Radiol. 2023 Apr;33(4):2954-2964. doi: 10.1007/s00330-022-09263-8. Epub 2022 Nov 23.

Abstract

Objectives: To establish a breast lesion risk stratification system using ultrasound images to predict breast malignancy and assess Breast Imaging Reporting and Data System (BI-RADS) categories simultaneously.

Methods: This multicenter study prospectively collected a dataset of ultrasound images for 5012 patients at thirty-two hospitals from December 2018 to December 2020. A deep learning (DL) model was developed to conduct binary categorization (benign and malignant) and BI-RADS categories (2, 3, 4a, 4b, 4c, and 5) simultaneously. The training set of 4212 patients and the internal test set of 416 patients were from thirty hospitals. The remaining two hospitals with 384 patients were used as an external test set. Three experienced radiologists performed a reader study on 324 patients randomly selected from the test sets. We compared the performance of the DL model with that of three radiologists and the consensus of the three radiologists.

Results: In the external test set, the DL model achieved areas under the receiver operating characteristic curve (AUCs) of 0.980 and 0.945 for the binary categorization and six-way categorizations, respectively. In the reader study set, the DL BI-RADS categories achieved a similar AUC (0.901 vs. 0.933, p = 0.0632), sensitivity (90.98% vs. 95.90%, p = 0.1094), and accuracy (83.33% vs. 79.01%, p = 0.0541), but higher specificity (78.71% vs. 68.81%, p = 0.0012) than those of the consensus of the three radiologists.

Conclusions: The DL model performed well in distinguishing benign from malignant breast lesions and yielded outcomes similar to experienced radiologists. This indicates the potential applicability of the DL model in clinical diagnosis.

Key points: • The DL model can achieve binary categorization for benign and malignant breast lesions and six-way BI-RADS categorizations for categories 2, 3, 4a, 4b, 4c, and 5, simultaneously. • The DL model showed acceptable agreement with radiologists for the classification of breast lesions. • The DL model performed well in distinguishing benign from malignant breast lesions and had promise in helping reduce unnecessary biopsies of BI-RADS 4a lesions.

Keywords: Artificial intelligence; Breast neoplasms; Deep learning; Diagnosis; Ultrasonography.

Publication types

  • Multicenter Study

MeSH terms

  • Breast / diagnostic imaging
  • Breast Neoplasms* / pathology
  • Deep Learning*
  • Female
  • Humans
  • Retrospective Studies
  • Risk Assessment
  • Ultrasonography
  • Ultrasonography, Mammary / methods