MIDAS: Mining differentially activated subpaths of KEGG pathways from multi-class RNA-seq data

Methods. 2017 Jul 15:124:13-24. doi: 10.1016/j.ymeth.2017.05.026. Epub 2017 Jun 1.

Abstract

Pathway based analysis of high throughput transcriptome data is a widely used approach to investigate biological mechanisms. Since a pathway consists of multiple functions, the recent approach is to determine condition specific sub-pathways or subpaths. However, there are several challenges. First, few existing methods utilize explicit gene expression information from RNA-seq. More importantly, subpath activity is usually an average of statistical scores, e.g., correlations, of edges in a candidate subpath, which fails to reflect gene expression quantity information. In addition, none of existing methods can handle multiple phenotypes. To address these technical problems, we designed and implemented an algorithm, MIDAS, that determines condition specific subpaths, each of which has different activities across multiple phenotypes. MIDAS utilizes gene expression quantity information fully and the network centrality information to determine condition specific subpaths. To test performance of our tool, we used TCGA breast cancer RNA-seq gene expression profiles with five molecular subtypes. 36 differentially activate subpaths were determined. The utility of our method, MIDAS, was demonstrated in four ways. All 36 subpaths are well supported by the literature information. Subsequently, we showed that these subpaths had a good discriminant power for five cancer subtype classification and also had a prognostic power in terms of survival analysis. Finally, in a performance comparison of MIDAS to a recent subpath prediction method, PATHOME, our method identified more subpaths and much more genes that are well supported by the literature information.

Availability: http://biohealth.snu.ac.kr/software/MIDAS/.

Keywords: KEGG pathway; Multi-class; Network centrality; RNA-seq; Subpath.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Breast Neoplasms / classification
  • Breast Neoplasms / genetics*
  • Breast Neoplasms / metabolism
  • Breast Neoplasms / mortality
  • Data Mining / methods
  • Data Mining / statistics & numerical data*
  • Female
  • Gene Expression Profiling
  • Gene Expression Regulation, Neoplastic*
  • Gene Regulatory Networks*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • RNA, Neoplasm / genetics*
  • RNA, Neoplasm / metabolism
  • Sequence Analysis, RNA
  • Signal Transduction
  • Software
  • Survival Analysis
  • Transcriptome

Substances

  • RNA, Neoplasm