NESM: a network embedding method for tumor stratification by integrating multi-omics data

G3 (Bethesda). 2022 Nov 4;12(11):jkac243. doi: 10.1093/g3journal/jkac243.

Abstract

Tumor stratification plays an important role in cancer diagnosis and individualized treatment. Recent developments in high-throughput sequencing technologies have produced huge amounts of multi-omics data, making it possible to stratify cancer types using multiple molecular datasets. We introduce a Network Embedding method for tumor Stratification by integrating Multi-omics data. Network Embedding method for tumor Stratification by integrating Multi-omics pregroup the samples, integrate the gene features and somatic mutation corresponding to cancer types within each group to construct patient features, and then integrate all groups to obtain comprehensive patient information. The gene features contain network topology information, because it is extracted by integrating deoxyribonucleic acid methylation, messenger ribonucleic acid expression data, and protein-protein interactions through network embedding method. On the one hand, a supervised learning method Light Gradient Boosting Machine is used to classify cancer types based on patient features. When compared with other 3 methods, Network Embedding method for tumor Stratification by integrating Multi-omics has the highest AUC in most cancer types. The average AUC for stratifying cancer types is 0.91, indicating that the patient features extracted by Network Embedding method for tumor Stratification by integrating Multi-omics are effective for tumor stratification. On the other hand, an unsupervised clustering algorithm Density-Based Spatial Clustering of Applications with Noise is utilized to divide single cancer subtypes. The vast majority of the subtypes identified by Network Embedding method for tumor Stratification by integrating Multi-omics are significantly associated with patient survival.

Keywords: cancer subtype; embedding network; multi-omics; pan-cancer.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cluster Analysis
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Neoplasms* / diagnosis
  • Neoplasms* / genetics