A New Method of Identifying Pathologic Complete Response After Neoadjuvant Chemotherapy for Breast Cancer Patients Using a Population-Based Electronic Medical Record System

Ann Surg Oncol. 2023 Apr;30(4):2095-2103. doi: 10.1245/s10434-022-12955-6. Epub 2022 Dec 21.

Abstract

Background: Accurate identification of pathologic complete response (pCR) from population-based electronic narrative data in a timely and cost-efficient manner is critical. This study aimed to derive and validate a set of natural language processing (NLP)-based machine-learning algorithms to capture pCR from surgical pathology reports of breast cancer patients who underwent neoadjuvant chemotherapy (NAC).

Methods: This retrospective cohort study included all invasive breast cancer patients who underwent NAC and subsequent curative-intent surgery during their admission at all four tertiary acute care hospitals in Calgary, Alberta, Canada, between 1 January 2010 and 31 December 2017. Surgical pathology reports were extracted and processed with NLP. Decision tree classifiers were constructed and validated against chart review results. Machine-learning algorithms were evaluated with a performance matrix including sensitivity, specificity, positive predictive value (PPV), negative predictive value [NPV], accuracy, area under the receiver operating characteristic curve [AUC], and F1 score.

Results: The study included 351 female patients. Of these patients, 102 (29%) achieved pCR after NAC. The high-sensitivity model achieved a sensitivity of 90.5% (95% confidence interval [CI], 69.6-98.9%), a PPV of 76% (95% CI, 59.6-87.2), an accuracy of 88.6% (95% CI, 78.7-94.9%), an AUC of 0.891 (95% CI, 0.795-0.987), and an F1 score of 82.61. The high-PPV algorithm reached a sensitivity of 85.7% (95% CI, 63.7-97%), a PPV of 81.8% (95% CI, 63.4-92.1%), an accuracy of 90% (95% CI, 80.5-95.9%), an AUC of 0.888 (95% CI, 0.790-0.985), and an F1 score of 83.72. The high-F1 score algorithm obtained a performance equivalent to that of the high-PPV algorithm.

Conclusion: The developed algorithms demonstrated excellent accuracy in identifying pCR from surgical pathology reports of breast cancer patients who received NAC treatment.

MeSH terms

  • Algorithms
  • Breast Neoplasms* / drug therapy
  • Breast Neoplasms* / pathology
  • Breast Neoplasms* / surgery
  • Electronic Health Records
  • Female
  • Humans
  • Neoadjuvant Therapy / methods
  • Retrospective Studies