Predicting the molecular subtype of breast cancer and identifying interpretable imaging features using machine learning algorithms

Eur Radiol. 2022 Mar;32(3):1652-1662. doi: 10.1007/s00330-021-08271-4. Epub 2021 Oct 13.

Abstract

Objectives: To evaluate the performance of interpretable machine learning models in predicting breast cancer molecular subtypes.

Methods: We retrospectively enrolled 600 patients with invasive breast carcinoma between 2012 and 2019. The patients were randomly divided into a training (n = 450) and a testing (n = 150) set. The five constructed models were trained based on clinical characteristics and imaging features (mammography and ultrasonography). The model classification performances were evaluated using the area under the receiver operating characteristic (ROC) curve (AUC), accuracy, sensitivity, and specificity. Shapley additive explanation (SHAP) technique was used to interpret the optimal model output. Then we choose the optimal model as the assisted model to evaluate the performance of another four radiologists in predicting the molecular subtype of breast cancer with or without model assistance, according to mammography and ultrasound images.

Results: The decision tree (DT) model performed the best in distinguishing triple-negative breast cancer (TNBC) from other breast cancer subtypes, yielding an AUC of 0.971; accuracy, 0.947; sensitivity, 0.905; and specificity, 0.941. The accuracy, sensitivity, and specificity of all radiologists in distinguishing TNBC from other molecular subtypes and Luminal breast cancer from other molecular subtypes have significantly improved with the assistance of DT model. In the diagnosis of TNBC versus other subtypes, the average sensitivity, average specificity, and average accuracy of less experienced and more experienced radiologists increased by 0.090, 0.125, 0.114, and 0.060, 0.090, 0.083, respectively. In the diagnosis of Luminal versus other subtypes, the average sensitivity, average specificity, and average accuracy of less experienced and more experienced radiologists increased by 0.084, 0.152, 0.159, and 0.020, 0.100, 0.048.

Conclusions: This study established an interpretable machine learning model to differentiate between breast cancer molecular subtypes, providing additional values for radiologists.

Key points: • Interpretable machine learning model (MLM) could help clinicians and radiologists differentiate between breast cancer molecular subtypes. • The Shapley additive explanations (SHAP) technique can select important features for predicting the molecular subtypes of breast cancer from a large number of imaging signs. • Machine learning model can assist radiologists to evaluate the molecular subtype of breast cancer to some extent.

Keywords: BI-RADS; Computer-aided diagnosis; Interpretable machine learning; Mammography and ultrasonography; Molecular subtype breast cancer.

Publication types

  • Randomized Controlled Trial

MeSH terms

  • Algorithms
  • Breast Neoplasms* / diagnostic imaging
  • Female
  • Humans
  • Machine Learning
  • Mammography
  • Retrospective Studies
  • Triple Negative Breast Neoplasms*