Establishment and Verification of a Bagged-Trees-Based Model for Prediction of Sentinel Lymph Node Metastasis for Early Breast Cancer Patients

Front Oncol. 2019 Apr 16:9:282. doi: 10.3389/fonc.2019.00282. eCollection 2019.

Abstract

Purpose: Lymph node metastasis is a multifactorial event. Several scholars have developed nomograph models to predict the sentinel lymph nodes (SLN) metastasis before operation. According to the clinical and pathological characteristics of breast cancer patients, we use the new method to establish a more comprehensive model and add some new factors which have never been analyzed in the world and explored the prospect of its clinical application. Materials and methods: The clinicopathological data of 633 patients with breast cancer who underwent SLN examination from January 2011 to December 2014 were retrospectively analyzed. Because of the imbalance in data, we used smote algorithm to oversample the data to increase the balanced amount of data. Our study for the first time included the shape of the tumor and breast gland content. The location of the tumor was analyzed by the vector combining quadrant method, at the same time we use the method of simply using quadrant or vector for comparing. We also compared the predictive ability of building models through logistic regression and Bagged-Tree algorithm. The Bagged-Tree algorithm was used to categorize samples. The SMOTE-Bagged Tree algorithm and 5-fold cross-validation was used to established the prediction model. The clinical application value of the model in early breast cancer patients was evaluated by confusion matrix and the area under receiver operating characteristic (ROC) curve (AUC). Results: Our predictive model included 12 variables as follows: age, body mass index (BMI), quadrant, clock direction, the distance of tumor from the nipple, morphology of tumor molybdenum target, glandular content, tumor size, ER, PR, HER2, and Ki-67.Finally, our model obtained the AUC value of 0.801 and the accuracy of 70.3%.We used logistic regression to established the model, in the modeling and validation groups, the area under the curve (AUC) were 0.660 and 0.580.We used the vector combining quadrant method to analyze the original location of the tumor, which is more precise than simply using vector or quadrant (AUC 0.801 vs. 0.791 vs. 0.701, Accuracy 70.3 vs. 70.3 vs. 63.6%). Conclusions: Our model is more reliable and stable to assist doctors predict the SLN metastasis in breast cancer patients before operation.

Keywords: bagged-trees; breast cancer; metastasis prediction; model; sentinel lymph nodes.