Evaluation and integration of cancer gene classifiers: identification and ranking of plausible drivers

Sci Rep. 2015 May 11:5:10204. doi: 10.1038/srep10204.

Abstract

The number of mutated genes in cancer cells is far larger than the number of mutations that drive cancer. The difficulty this creates for identifying relevant alterations has stimulated the development of various computational approaches to distinguishing drivers from bystanders. We develop and apply an ensemble classifier (EC) machine learning method, which integrates 10 classifiers that are publically available, and apply it to breast and ovarian cancer. In particular we find the following: (1) Using both standard and non-standard metrics, EC almost always outperforms single method classifiers, often by wide margins. (2) Of the 50 highest ranked genes for breast (ovarian) cancer, 34 (30) are associated with other cancers in either the OMIM, CGC or NCG database (P < 10(-22)). (3) Another 10, for both breast and ovarian cancer, have been identified by GWAS studies. (4) Several of the remaining genes--including a protein kinase that regulates the Fra-1 transcription factor which is overexpressed in ER negative breast cancer cells; and Fyn, which is overexpressed in pancreatic and prostate cancer, among others--are biologically plausible. Biological implications are briefly discussed. Source codes and detailed results are available at http://www.visantnet.org/misi/driver_integration.zip.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Animals
  • Databases, Genetic*
  • Genes, Neoplasm*
  • Humans
  • Machine Learning*
  • Mutation*
  • Neoplasm Proteins* / classification
  • Neoplasm Proteins* / genetics
  • United States

Substances

  • Neoplasm Proteins