Machine learning analysis identifies genes differentiating triple negative breast cancers

Sci Rep. 2020 Jun 26;10(1):10464. doi: 10.1038/s41598-020-67525-1.

Abstract

Triple negative breast cancer (TNBC) is one of the most aggressive form of breast cancer (BC) with the highest mortality due to high rate of relapse, resistance, and lack of an effective treatment. Various molecular approaches have been used to target TNBC but with little success. Here, using machine learning algorithms, we analyzed the available BC data from the Cancer Genome Atlas Network (TCGA) and have identified two potential genes, TBC1D9 (TBC1 domain family member 9) and MFGE8 (Milk Fat Globule-EGF Factor 8 Protein), that could successfully differentiate TNBC from non-TNBC, irrespective of their heterogeneity. TBC1D9 is under-expressed in TNBC as compared to non-TNBC patients, while MFGE8 is over-expressed. Overexpression of TBC1D9 has a better prognosis whereas overexpression of MFGE8 correlates with a poor prognosis. Protein-protein interaction analysis by affinity purification mass spectrometry (AP-MS) and proximity biotinylation (BioID) experiments identified a role for TBC1D9 in maintaining cellular integrity, whereas MFGE8 would be involved in various tumor survival processes. These promising genes could serve as biomarkers for TNBC and deserve further investigation as they have the potential to be developed as therapeutic targets for TNBC.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Antigens, Surface / genetics
  • Biomarkers, Tumor / genetics
  • Calcium-Binding Proteins / genetics
  • Female
  • Gene Expression Regulation, Neoplastic / genetics
  • HEK293 Cells
  • Humans
  • Machine Learning
  • Neoplasm Recurrence, Local / genetics
  • Prognosis
  • Transcriptome / genetics
  • Triple Negative Breast Neoplasms / genetics*
  • Triple Negative Breast Neoplasms / pathology

Substances

  • Antigens, Surface
  • Biomarkers, Tumor
  • Calcium-Binding Proteins