Improving the robustness and stability of a machine learning model for breast cancer prognosis through the use of multi-modal classifiers

Sci Rep. 2023 Mar 11;13(1):4079. doi: 10.1038/s41598-023-30143-8.

Abstract

Breast cancer is a deadly disease with a high mortality rate among PAN cancers. The advancements in biomedical information retrieval techniques have been beneficial in developing early prognosis and diagnosis systems for cancer patients. These systems provide the oncologist with plenty of information from several modalities to make the correct and feasible treatment plan for breast cancer patients and protect them from unnecessary therapies and their toxic side effects. The cancer patient's related information can be collected using various modalities like clinical, copy number variation, DNA-methylation, microRNA sequencing, gene expression, and histopathological whole slide images. High dimensionality and heterogeneity in these modalities demand the development of some intelligent systems to understand related features to the prognosis and diagnosis of diseases and make correct predictions. In this work, we have studied some end-to-end systems having two main components : (a) dimensionality reduction techniques applied to original features from different modalities and (b) classification techniques applied to the fusion of reduced feature vectors from different modalities for automatic predictions of breast cancer patients into two categories: short-time and long-time survivors. Principal component analysis (PCA) and variational auto-encoders (VAEs) are used as the dimensionality reduction techniques, followed by support vector machines (SVM) or random forest as the machine learning classifiers. The study utilizes raw, PCA, and VAE extracted features of the TCGA-BRCA dataset from six different modalities as input to the machine learning classifiers. We conclude this study by suggesting that adding more modalities to the classifiers provides complementary information to the classifier and increases the stability and robustness of the classifiers. In this study, the multimodal classifiers have not been validated on primary data prospectively.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Breast / pathology
  • Breast Neoplasms* / pathology
  • DNA Copy Number Variations
  • Female
  • Humans
  • Machine Learning
  • Prognosis
  • Support Vector Machine