Finding disagreement pathway signatures and constructing an ensemble model for cancer classification

Sci Rep. 2017 Aug 30;7(1):10044. doi: 10.1038/s41598-017-10258-5.

Abstract

Cancer classification based on molecular level is a relatively routine research procedure with advances in high-throughput molecular profiling techniques. However, the number of genes typically far exceeds the number of the sample size in gene expression studies. The existing gene selection methods are almost based on statistics and machine learning, overlooking relevant biological principles or knowledge while working with biological data. Here, we propose a robust ensemble learning paradigm, which incorporates multiple pathways information, to predict cancer classification. We compare the proposed method with other methods, such as Elastic SCAD and PPDMF, and estimate the classification performance. The results show that the proposed method has the higher performances on most metrics and robust performance. We further investigate the biological mechanism of the ensemble feature genes. The results demonstrate that the ensemble feature genes are associated with drug targets/clinically-relevant cancer. In addition, some core biological pathways and biological process underlying clinically-relevant phenotypes are identified by function annotation. Overall, our research can provide a new perspective for the further study of molecular activities and manifestations of cancer.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Anthracyclines / therapeutic use
  • Antineoplastic Agents / therapeutic use
  • Atlases as Topic
  • Datasets as Topic
  • Gene Expression Profiling
  • Gene Expression Regulation, Neoplastic*
  • Gene Regulatory Networks*
  • Humans
  • Neoplasm Proteins / genetics*
  • Neoplasm Proteins / metabolism
  • Neoplasms / classification*
  • Neoplasms / diagnosis
  • Neoplasms / drug therapy
  • Neoplasms / genetics
  • Oligonucleotide Array Sequence Analysis
  • Signal Transduction
  • Support Vector Machine*
  • Taxoids / therapeutic use

Substances

  • Anthracyclines
  • Antineoplastic Agents
  • Neoplasm Proteins
  • Taxoids