Developing a Prediction Model for Pathologic Complete Response Following Neoadjuvant Chemotherapy in Breast Cancer: A Comparison of Model Building Approaches

Robert B Basmadjian; Shiying Kong; Devon J Boyne; Tamer N Jarada; Yuan Xu; Winson Y Cheung; Sasha Lupichuk; May Lynn Quan; Darren R Brenner

doi:10.1200/CCI.21.00055

Developing a Prediction Model for Pathologic Complete Response Following Neoadjuvant Chemotherapy in Breast Cancer: A Comparison of Model Building Approaches

JCO Clin Cancer Inform. 2022 Feb:6:e2100055. doi: 10.1200/CCI.21.00055.

Authors

Robert B Basmadjian¹, Shiying Kong^{1

2

3}, Devon J Boyne^{1

2}, Tamer N Jarada², Yuan Xu^{1

2

3}, Winson Y Cheung^{1

2}, Sasha Lupichuk^{1

2}, May Lynn Quan^{1

2

3}, Darren R Brenner^{1

2}

Affiliations

¹ Department of Community Health Sciences, Foothills Medical Centre, University of Calgary, Calgary, Alberta, Canada.
² Department of Oncology, University of Calgary, Tom Baker Cancer Centre, Calgary, Alberta, Canada.
³ Department of Surgery, Foothills Medical Centre, University of Calgary, Calgary, Alberta, Canada.

Abstract

Purpose: The optimal characteristics among patients with breast cancer to recommend neoadjuvant chemotherapy is an active area of clinical research. We developed and compared several approaches to developing prediction models for pathologic complete response (pCR) among patients with breast cancer in Alberta.

Methods: The study included all patients with breast cancer who received neoadjuvant chemotherapy in Alberta between 2012 and 2014 identified from the Alberta Cancer Registry. Patient, tumor, and treatment data were obtained through primary chart review. pCR was defined as no residual invasive tumor at surgical excision in breast or axilla. Two types of prediction models for pCR were built: (1) expert model: variables selected on the basis of oncologists' opinions and (2) data-driven model: variables selected by trained machine. These model types were fit using logistic regression (LR), random forests (RF), and gradient-boosted trees (GBT). We compared the models using area under the receiver operating characteristic curve and integrated calibration index, and internally validated using bootstrap resampling.

Results: A total of 363 cases were included in the analyses, of which 86 experienced pCR. The RF and GBT fits yielded higher optimism-corrected area under the receiver operating characteristic curves compared with LR for the expert (RF: 0.70; GBT: 0.69; LR: 0.65) and data-driven models (RF: 0.71; GBT: 0.68; LR: 0.64). The LR fit yielded the lowest integrated calibration indices for the expert (LR: 0.037; GBT: 0.05; RF: 0.10) and data-driven models (LR: 0.026; GBT: 0.06; RF: 0.099).

Conclusion: Our models demonstrated predictive ability for pCR using routinely collected clinical and demographic variables. We show that machine learning fit methods can be used to optimize models for pCR prediction. We also show that additional variables beyond clinical expertise do not considerably improve predictive ability and may not be of value on the basis of the burden of data collection.

MeSH terms

Breast / pathology
Breast Neoplasms* / drug therapy
Breast Neoplasms* / pathology
Female
Humans
Machine Learning
Neoadjuvant Therapy* / methods
ROC Curve