Development and External Validation of a Machine Learning Model to Predict Pathological Complete Response After Neoadjuvant Chemotherapy in Breast Cancer

Ji-Jung Jung; Eun-Kyu Kim; Eunyoung Kang; Jee Hyun Kim; Se Hyun Kim; Koung Jin Suh; Sun Mi Kim; Mijung Jang; Bo La Yun; So Yeon Park; Changjin Lim; Wonshik Han; Hee-Chul Shin

doi:10.4048/jbc.2023.26.e14

Development and External Validation of a Machine Learning Model to Predict Pathological Complete Response After Neoadjuvant Chemotherapy in Breast Cancer

J Breast Cancer. 2023 Aug;26(4):353-362. doi: 10.4048/jbc.2023.26.e14. Epub 2023 Mar 28.

Authors

Ji-Jung Jung¹, Eun-Kyu Kim^{1

2}, Eunyoung Kang^{1

2}, Jee Hyun Kim³, Se Hyun Kim³, Koung Jin Suh³, Sun Mi Kim⁴, Mijung Jang⁴, Bo La Yun⁴, So Yeon Park⁵, Changjin Lim¹, Wonshik Han^{1

6}, Hee-Chul Shin^{1

7}

Affiliations

¹ Department of Surgery, Seoul National University College of Medicine, Seoul, Korea.
² Department of Surgery, Seoul National University Bundang Hospital, Seoul National University College of Medicine, Seongnam, Korea.
³ Department of Internal Medicine, Seoul National University Bundang Hospital, Seoul National University College of Medicine, Seongnam, Korea.
⁴ Department of Radiology, Seoul National University Bundang Hospital, Seoul National University College of Medicine, Seongnam, Korea.
⁵ Department of Pathology, Seoul National University Bundang Hospital, Seoul National University College of Medicine, Seongnam, Korea.
⁶ Cancer Research Institute, Seoul National University College of Medicine, Seoul, Korea.
⁷ Department of Surgery, Seoul National University Bundang Hospital, Seoul National University College of Medicine, Seongnam, Korea. dradam77@naver.com.

Abstract

Purpose: Several predictive models have been developed to predict the pathological complete response (pCR) after neoadjuvant chemotherapy (NAC); however, few are broadly applicable owing to radiologic complexity and institution-specific clinical variables, and none have been externally validated. This study aimed to develop and externally validate a machine learning model that predicts pCR after NAC in patients with breast cancer using routinely collected clinical and demographic variables.

Methods: The electronic medical records of patients with advanced breast cancer who underwent NAC before surgical resection between January 2017 and December 2020 were reviewed. Patient data from Seoul National University Bundang Hospital were divided into training and internal validation cohorts. Five machine learning techniques, including gradient boosting machine (GBM), support vector machine, random forest, decision tree, and neural network, were used to build predictive models, and the area under the receiver operating characteristic curve (AUC) was compared to select the best model. Finally, the model was validated using an independent cohort from Seoul National University Hospital.

Results: A total of 1,003 patients were included in the study: 287, 71, and 645 in the training, internal validation, and external validation cohorts, respectively. Overall, 36.3% of the patients achieved pCR. Among the five machine learning models, the GBM showed the highest AUC for pCR prediction (AUC, 0.903; 95% confidence interval [CI], 0.833-0.972). External validation confirmed an AUC of 0.833 (95% CI, 0.800-0.865).

Conclusion: Commonly available clinical and demographic variables were used to develop a machine learning model for predicting pCR following NAC. External validation of the model demonstrated good discrimination power, indicating that routinely collected variables were sufficient to build a good prediction model.

Keywords: Breast Neoplasms; Machine Learning; Neoadjuvant Therapy.