Machine Learning Frameworks to Predict Neoadjuvant Chemotherapy Response in Breast Cancer Using Clinical and Pathological Features

Nicholas Meti; Khadijeh Saednia; Andrew Lagree; Sami Tabbarah; Majid Mohebpour; Alex Kiss; Fang-I Lu; Elzbieta Slodkowska; Sonal Gandhi; Katarzyna Joanna Jerzak; Lauren Fleshner; Ethan Law; Ali Sadeghi-Naini; William T Tran

doi:10.1200/CCI.20.00078

Machine Learning Frameworks to Predict Neoadjuvant Chemotherapy Response in Breast Cancer Using Clinical and Pathological Features

JCO Clin Cancer Inform. 2021 Jan:5:66-80. doi: 10.1200/CCI.20.00078.

Authors

Nicholas Meti^{1

2}, Khadijeh Saednia³, Andrew Lagree⁴, Sami Tabbarah⁴, Majid Mohebpour⁴, Alex Kiss⁵, Fang-I Lu⁶, Elzbieta Slodkowska⁶, Sonal Gandhi^{1

7}, Katarzyna Joanna Jerzak^{1

7}, Lauren Fleshner⁴, Ethan Law⁴, Ali Sadeghi-Naini^{2

3

4}, William T Tran^{2

4

8}

Affiliations

¹ Division of Medical Oncology, Department of Medicine, University of Toronto, ON, Canada.
² Temerty Centre for AI Research and Education in Medicine, University of Toronto, ON, Toronto, Canada.
³ Department of Electrical Engineering and Computer Science, Lassonde School of Engineering, York University, Toronto, ON, Canada.
⁴ Department of Radiation Oncology, Sunnybrook Health Sciences Center, Toronto, ON, Canada.
⁵ Institute of Clinical Evaluative Sciences, Sunnybrook Health Sciences Centre, Toronto, ON, Canada.
⁶ Department of Laboratory Medicine and Molecular Diagnostics, Sunnybrook Health Sciences Centre, Toronto, ON, Canada.
⁷ Division of Medical Oncology, Sunnybrook Health Sciences Center, Toronto, ON, Canada.
⁸ Department of Radiation Oncology, University of Toronto, Toronto, ON, Canada.

PMID: 33439725
DOI: 10.1200/CCI.20.00078

Abstract

Purpose: Neoadjuvant chemotherapy (NAC) is used to treat locally advanced breast cancer (LABC) and high-risk early breast cancer (BC). Pathological complete response (pCR) has prognostic value depending on BC subtype. Rates of pCR, however, can be variable. Predictive modeling is desirable to help identify patients early who may have suboptimal NAC response. Here, we test and compare the predictive performances of machine learning (ML) prediction models to a standard statistical model, using clinical and pathological data.

Methods: Clinical and pathological variables were collected in 431 patients, including tumor size, patient demographics, histological characteristics, molecular status, and staging information. A standard multivariable logistic regression (MLR) was developed and compared with five ML models: k-nearest neighbor classifier, random forest (RF) classifier, naive Bayes algorithm, support vector machine, and multilayer perceptron model. Model performances were measured using a receiver operating characteristic (ROC) analysis and statistically compared.

Results: MLR predictors of NAC response included: estrogen receptor (ER) status, human epidermal growth factor-2 (HER2) status, tumor size, and Nottingham grade. The strongest MLR predictors of pCR included HER2+ versus HER2- BC (odds ratio [OR], 0.13; 95% CI, 0.07 to 0.23; P < .001) and Nottingham grade G3 versus G1-2 (G1-2: OR, 0.36; 95% CI, 0.20 to 0.65; P < .001). The area under the curve (AUC) for the MLR was AUC = 0.64. Among the various ML models, an RF classifier performed best, with an AUC = 0.88, sensitivity of 70.7%, and specificity of 84.6%, and included the following variables: menopausal status, ER status, HER2 status, Nottingham grade, tumor size, nodal status, and presence of inflammatory BC.

Conclusion: Modeling performances varied between standard versus ML classification methods. RF ML classifiers demonstrated the best predictive performance among all models.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Bayes Theorem
Breast
Breast Neoplasms* / therapy
Female
Humans
Machine Learning*
Neoadjuvant Therapy*

Grants and funding

CIHR/Canada