Applying Explainable Machine Learning Models for Detection of Breast Cancer Lymph Node Metastasis in Patients Eligible for Neoadjuvant Treatment

Josip Vrdoljak; Zvonimir Boban; Domjan Barić; Darko Šegvić; Marko Kumrić; Manuela Avirović; Melita Perić Balja; Marija Milković Periša; Čedna Tomasović; Snježana Tomić; Eduard Vrdoljak; Joško Božić

doi:10.3390/cancers15030634

Applying Explainable Machine Learning Models for Detection of Breast Cancer Lymph Node Metastasis in Patients Eligible for Neoadjuvant Treatment

Cancers (Basel). 2023 Jan 19;15(3):634. doi: 10.3390/cancers15030634.

Authors

Affiliations

¹ Department of Pathophysiology, University of Split School of Medicine, 21000 Split, Croatia.
² Department of Biophysics, University of Split School of Medicine, 21000 Split, Croatia.
³ Department of Physics, University of Zagreb Faculty of Science, 10000 Zagreb, Croatia.
⁴ Sigmoid Lab, Postindustria Group, 21000 Split, Croatia.
⁵ Department of Pathology, University Hospital of Rijeka, 51000 Rijeka, Croatia.
⁶ Department of Pathology, Clinical Hospital Sestre Milosrdnice, 10000 Zagreb, Croatia.
⁷ Department of Pathology, Clinical Hospital Zagreb, 10000 Zagreb, Croatia.
⁸ Department of Pathology, Clinical Hospital Dubrava, 10000 Zagreb, Croatia.
⁹ Department of Pathology, University Hospital of Split, 21000 Split, Croatia.
¹⁰ Department of Oncology, University Hospital of Split, 21000 Split, Croatia.

Abstract

Background: Due to recent changes in breast cancer treatment strategy, significantly more patients are treated with neoadjuvant systemic therapy (NST). Radiological methods do not precisely determine axillary lymph node status, with up to 30% of patients being misdiagnosed. Hence, supplementary methods for lymph node status assessment are needed. This study aimed to apply and evaluate machine learning models on clinicopathological data, with a focus on patients meeting NST criteria, for lymph node metastasis prediction.

Methods: From the total breast cancer patient data (n = 8381), 719 patients were identified as eligible for NST. Machine learning models were applied for the NST-criteria group and the total study population. Model explainability was obtained by calculating Shapley values.

Results: In the NST-criteria group, random forest achieved the highest performance (AUC: 0.793 [0.713, 0.865]), while in the total study population, XGBoost performed the best (AUC: 0.762 [0.726, 0.795]). Shapley values identified tumor size, Ki-67, and patient age as the most important predictors.

Conclusion: Tree-based models achieve a good performance in assessing lymph node status. Such models can lead to more accurate disease stage prediction and consecutively better treatment selection, especially for NST patients where radiological and clinical findings are often the only way of lymph node assessment.

Keywords: breast cancer; lymph node metastasis; machine learning; neoadjuvant systemic treatment.

Grants and funding

This research received no external funding.