Improved detection of gene fusions by applying statistical methods reveals oncogenic RNA cancer drivers

Proc Natl Acad Sci U S A. 2019 Jul 30;116(31):15524-15533. doi: 10.1073/pnas.1900391116. Epub 2019 Jul 15.

Abstract

The extent to which gene fusions function as drivers of cancer remains a critical open question. Current algorithms do not sufficiently identify false-positive fusions arising during library preparation, sequencing, and alignment. Here, we introduce Data-Enriched Efficient PrEcise STatistical fusion detection (DEEPEST), an algorithm that uses statistical modeling to minimize false-positives while increasing the sensitivity of fusion detection. In 9,946 tumor RNA-sequencing datasets from The Cancer Genome Atlas (TCGA) across 33 tumor types, DEEPEST identifies 31,007 fusions, 30% more than identified by other methods, while calling 10-fold fewer false-positive fusions in nontransformed human tissues. We leverage the increased precision of DEEPEST to discover fundamental cancer biology. Namely, 888 candidate oncogenes are identified based on overrepresentation in DEEPEST calls, and 1,078 previously unreported fusions involving long intergenic noncoding RNAs, demonstrating a previously unappreciated prevalence and potential for function. DEEPEST also reveals a high enrichment for fusions involving oncogenes in cancers, including ovarian cancer, which has had minimal treatment advances in recent decades, finding that more than 50% of tumors harbor gene fusions predicted to be oncogenic. Specific protein domains are enriched in DEEPEST calls, indicating a global selection for fusion functionality: kinase domains are nearly 2-fold more enriched in DEEPEST calls than expected by chance, as are domains involved in (anaerobic) metabolism and DNA binding. The statistical algorithms, population-level analytic framework, and the biological conclusions of DEEPEST call for increased attention to gene fusions as drivers of cancer and for future research into using fusions for targeted therapy.

Keywords: TCGA; bioinformatics; cancer genomics; gene fusion; pan-cancer analysis.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Base Sequence
  • Databases, Genetic
  • Gene Fusion*
  • Genomic Instability
  • Humans
  • Neoplasms / genetics*
  • Oncogenes*
  • Proteome / metabolism
  • RNA, Long Noncoding / genetics
  • RNA, Long Noncoding / metabolism
  • RNA, Neoplasm / genetics*
  • Statistics as Topic*

Substances

  • Proteome
  • RNA, Long Noncoding
  • RNA, Neoplasm