Network based stratification of major cancers by integrating somatic mutation and gene expression data

PLoS One. 2017 May 16;12(5):e0177662. doi: 10.1371/journal.pone.0177662. eCollection 2017.

Abstract

The stratification of cancer into subtypes that are significantly associated with clinical outcomes is beneficial for targeted prognosis and treatment. In this study, we integrated somatic mutation and gene expression data to identify clusters of patients. In contrast to previous studies, we constructed cancer-type-specific significant co-expression networks (SCNs) rather than using a fixed gene network across all cancers, such as the network-based stratification (NBS) method, which ignores cancer heterogeneity. For each type of cancer, the gene expression data were used to construct the SCN network, while the gene somatic mutation data were mapped onto the network, propagated, and used for further clustering. For the clustering, we adopted an improved network-regularized non-negative matrix factorization (netNMF) (netNMF_HC) for a more precise classification. We applied our method to various datasets, including ovarian cancer (OV), lung adenocarcinoma (LUAD) and uterine corpus endometrial carcinoma (UCEC) cohorts derived from the TCGA (The Cancer Genome Atlas) project. Based on the results, we evaluated the performance of our method to identify survival-relevant subtypes and further compared it to the NBS method, which adopts priori networks and netNMF algorithm. The proposed algorithm outperformed the NBS method in identifying informative cancer subtypes that were significantly associated with clinical outcomes in most cancer types we studied. In particular, our method identified survival-associated UCEC subtypes that were not identified by the NBS method. Our analysis indicated valid subtyping of patient could be applied by mutation data with cancer-type-specific SCNs and netNMF_HC for individual cancers because of specific cancer co-expression patterns and more precise clustering.

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Computational Biology / methods
  • Databases, Nucleic Acid
  • Gene Expression Profiling
  • Gene Expression Regulation, Neoplastic*
  • Gene Regulatory Networks*
  • Humans
  • Mutation*
  • Neoplasms / genetics*
  • Neoplasms / mortality
  • Prognosis
  • Survival Analysis
  • Transcriptome*
  • Workflow

Grants and funding

This work was supported by the Natural Science Foundation of China under Grants 61571341, and 61201312, and the Natural Science Foundation of Shaanxi Province in China under Grants 2016JM6047 and 2015JM6275. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.