Machine learning approaches for predicting 5-year breast cancer survival: A multicenter study

Cancer Sci. 2023 Oct;114(10):4063-4072. doi: 10.1111/cas.15917. Epub 2023 Jul 25.

Abstract

The study used clinical data to develop a prediction model for breast cancer survival. Breast cancer prognostic factors were explored using machine learning techniques. We conducted a retrospective study using data from the Taipei Medical University Clinical Research Database, which contains electronic medical records from three affiliated hospitals in Taiwan. The study included female patients aged over 20 years who were diagnosed with primary breast cancer and had medical records in hospitals between January 1, 2009 and December 31, 2020. The data were divided into training and external testing datasets. Nine different machine learning algorithms were applied to develop the models. The performances of the algorithms were measured using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1-score. A total of 3914 patients were included in the study. The highest AUC of 0.95 was observed with the artificial neural network model (accuracy, 0.90; sensitivity, 0.71; specificity, 0.73; PPV, 0.28; NPV, 0.94; and F1-score, 0.37). Other models showed relatively high AUC, ranging from 0.75 to 0.83. According to the optimal model results, cancer stage, tumor size, diagnosis age, surgery, and body mass index were the most critical factors for predicting breast cancer survival. The study successfully established accurate 5-year survival predictive models for breast cancer. Furthermore, the study found key factors that could affect breast cancer survival in Taiwanese women. Its results might be used as a reference for the clinical practice of breast cancer treatment.

Keywords: TMUCRD; breast cancer survival; machine learning; prediction models; real-world data.

Publication types

  • Multicenter Study

MeSH terms

  • Adult
  • Breast Neoplasms*
  • Female
  • Humans
  • Machine Learning
  • Predictive Value of Tests
  • ROC Curve
  • Retrospective Studies