Machine Learning Algorithm for Distinguishing Ductal Carcinoma In Situ from Invasive Breast Cancer

Cancers (Basel). 2022 May 15;14(10):2437. doi: 10.3390/cancers14102437.

Abstract

Purpose: Given that early identification of breast cancer type allows for less-invasive therapies, we aimed to develop a machine learning model to discriminate between ductal carcinoma in situ (DCIS) and minimally invasive breast cancer (MIBC).

Methods: In this retrospective study, the health records of 420 women who underwent biopsies between 2010 and 2020 to confirm breast cancer were collected. A trained XGBoost algorithm was used to classify cancers as either DCIS or MIBC using clinical characteristics, mammographic findings, ultrasonographic findings, and histopathological features. Its performance was measured against other methods using area under the receiver operating characteristic curve (AUC), sensitivity, specificity, accuracy, precision, and F1 score.

Results: The model was trained using 357 women and tested using 63 women with an overall 420 patients (mean [standard deviation] age, 57.1 [12.0] years). The model performed well when feature importance was determined, reaching an accuracy of 0.84 (95% confidence interval [CI], 0.76-0.91), an AUC of 0.93 (95% CI, 0.87-0.95), a specificity of 0.75 (95% CI, 0.67-0.83), and a sensitivity of 0.91 (95% CI, 0.76-0.94).

Conclusion: The XGBoost model, combining clinical, mammographic, ultrasonographic, and histopathologic findings, can be used to discriminate DCIS from MIBC with an accuracy equivalent to that of experienced radiologists, thereby giving patients the widest range of therapeutic options.

Keywords: XGBoost; breast cancer; ductal carcinoma in situ; mammographic; minimally invasive breast cancer; ultrasonographic.