Unsupervised Learning Framework With Multidimensional Scaling in Predicting Epithelial-Mesenchymal Transitions

IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2714-2723. doi: 10.1109/TCBB.2020.2992605. Epub 2021 Dec 8.

Abstract

Clustering tumor metastasis samples from gene expression data at the whole genome level remains an arduous challenge, in particular, when the number of experimental samples is small and the number of genes is huge. We focus on the prediction of the epithelial-mesenchymal transition (EMT), which is an underlying mechanism of tumor metastasis, here, rather than tumor metastasis itself, to avoid confounding effects of uncertainties derived from various factors. In this paper, we propose a novel model in predicting EMT based on multidimensional scaling (MDS) strategies and integrating entropy and random matrix detection strategies to determine the optimal reduced number of dimension in low dimensional space. We verified our proposed model with the gene expression data for EMT samples of breast cancer and the experimental results demonstrated the superiority over state-of-the-art clustering methods. Furthermore, we developed a novel feature extraction method for selecting the significant genes and predicting the tumor metastasis. The source code is available at "https://github.com/yushanqiu/yushan.qiu-szu.edu.cn".

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Breast Neoplasms / genetics
  • Breast Neoplasms / pathology
  • Cluster Analysis
  • Computational Biology / methods*
  • Epithelial-Mesenchymal Transition / genetics*
  • Female
  • Humans
  • Multidimensional Scaling Analysis*
  • Neoplasm Metastasis / genetics
  • Transcriptome / genetics
  • Unsupervised Machine Learning*