FuGePrior: A novel gene fusion prioritization algorithm based on accurate fusion structure analysis in cancer RNA-seq samples

BMC Bioinformatics. 2017 Jan 23;18(1):58. doi: 10.1186/s12859-016-1450-6.

Abstract

Background: Latest Next Generation Sequencing technologies opened the way to a novel era of genomic studies, allowing to gain novel insights into multifactorial pathologies as cancer. In particular gene fusion detection and comprehension have been deeply enhanced by these methods. However, state of the art algorithms for gene fusion identification are still challenging. Indeed, they identify huge amounts of poorly overlapping candidates and all the reported fusions should be considered for in lab validation clearly overwhelming wet lab capabilities.

Results: In this work we propose a novel methodological approach and tool named FuGePrior for the prioritization of gene fusions from paired-end RNA-Seq data. The proposed pipeline combines state of the art tools for chimeric transcript discovery and prioritization, a series of filtering and processing steps designed by considering modern literature on gene fusions and an analysis on functional reliability of gene fusion structure.

Conclusions: FuGePrior performance has been assessed on two publicly available paired-end RNA-Seq datasets: The first by Edgren and colleagues includes four breast cancer cell lines and a normal breast sample, whereas the second by Ren and colleagues comprises fourteen primary prostate cancer samples and their paired normal counterparts. FuGePrior results accounted for a reduction in the number of fusions output of chimeric transcript discovery tools that ranges from 65 to 75% depending on the considered breast cancer cell line and from 37 to 65% according to the prostate cancer sample under examination. Furthermore, since both datasets come with a partial validation we were able to assess the performance of FuGePrior in correctly prioritizing real gene fusions. Specifically, 25 out of 26 validated fusions in breast cancer dataset have been correctly labelled as reliable and biologically significant. Similarly, 2 out of 5 validated fusions in prostate dataset have been recognized as priority by FuGePrior tool.

Keywords: Chimeric transcript discovery tools; Gene fusion prioritization; Gene fusions; RNA-sequencing.

MeSH terms

  • Algorithms
  • Breast Neoplasms / genetics*
  • Cell Line, Tumor
  • Databases, Genetic
  • Female
  • Genomics
  • Humans
  • MCF-7 Cells
  • Male
  • Prostatic Neoplasms / genetics*
  • Recombinant Fusion Proteins / chemistry
  • Recombinant Fusion Proteins / genetics*
  • Reproducibility of Results
  • Sequence Analysis, RNA*
  • Software

Substances

  • Recombinant Fusion Proteins