Identification of Breast Cancer Metastasis Markers from Gene Expression Profiles Using Machine Learning Approaches

Genes (Basel). 2023 Sep 20;14(9):1820. doi: 10.3390/genes14091820.

Abstract

Cancer metastasis accounts for approximately 90% of cancer deaths, and elucidating markers in metastasis is the first step in its prevention. To characterize metastasis marker genes (MGs) of breast cancer, XGBoost models that classify metastasis status were trained with gene expression profiles from TCGA. Then, a metastasis score (MS) was assigned to each gene by calculating the inner product between the feature importance and the AUC performance of the models. As a result, 54, 202, and 357 genes with the highest MS were characterized as MGs by empirical p-value cutoffs of 0.001, 0.005, and 0.01, respectively. The three sets of MGs were compared with those from existing metastasis marker databases, which provided significant results in most comparisons (p-value < 0.05). They were also significantly enriched in biological processes associated with breast cancer metastasis. The three MGs, SPPL2C, KRT23, and RGS7, showed highly significant results (p-value < 0.01) in the survival analysis. The MGs that could not be identified by statistical analysis (e.g., GOLM1, ELAVL1, UBP1, and AZGP1), as well as the MGs with the highest MS (e.g., ZNF676, FAM163B, LDOC2, IRF1, and STK40), were verified via the literature. Additionally, we checked how close the MGs were to each other in the protein-protein interaction networks. We expect that the characterized markers will help understand and prevent breast cancer metastasis.

Keywords: XGBoost; breast cancer; feature importance; gene expression; machine learning; metastasis marker.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Breast Neoplasms* / pathology
  • Female
  • Humans
  • Machine Learning
  • Melanoma, Cutaneous Malignant
  • Membrane Proteins / genetics
  • Neoplasms, Second Primary*
  • Protein Interaction Maps
  • RGS Proteins* / genetics
  • Transcriptome

Substances

  • GOLM1 protein, human
  • Membrane Proteins
  • RGS7 protein, human
  • RGS Proteins

Grants and funding

This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (NRF-2022R1C1C1008823) and by a grant from the Ministry of Food and Drug Safety given in 2021 (21162MFDS045).