Development and External Validation of a Machine Learning Model to Predict Pathological Complete Response After Neoadjuvant Chemotherapy in Breast Cancer

J Breast Cancer. 2023 Aug;26(4):353-362. doi: 10.4048/jbc.2023.26.e14. Epub 2023 Mar 28.

Abstract

Purpose: Several predictive models have been developed to predict the pathological complete response (pCR) after neoadjuvant chemotherapy (NAC); however, few are broadly applicable owing to radiologic complexity and institution-specific clinical variables, and none have been externally validated. This study aimed to develop and externally validate a machine learning model that predicts pCR after NAC in patients with breast cancer using routinely collected clinical and demographic variables.

Methods: The electronic medical records of patients with advanced breast cancer who underwent NAC before surgical resection between January 2017 and December 2020 were reviewed. Patient data from Seoul National University Bundang Hospital were divided into training and internal validation cohorts. Five machine learning techniques, including gradient boosting machine (GBM), support vector machine, random forest, decision tree, and neural network, were used to build predictive models, and the area under the receiver operating characteristic curve (AUC) was compared to select the best model. Finally, the model was validated using an independent cohort from Seoul National University Hospital.

Results: A total of 1,003 patients were included in the study: 287, 71, and 645 in the training, internal validation, and external validation cohorts, respectively. Overall, 36.3% of the patients achieved pCR. Among the five machine learning models, the GBM showed the highest AUC for pCR prediction (AUC, 0.903; 95% confidence interval [CI], 0.833-0.972). External validation confirmed an AUC of 0.833 (95% CI, 0.800-0.865).

Conclusion: Commonly available clinical and demographic variables were used to develop a machine learning model for predicting pCR following NAC. External validation of the model demonstrated good discrimination power, indicating that routinely collected variables were sufficient to build a good prediction model.

Keywords: Breast Neoplasms; Machine Learning; Neoadjuvant Therapy.