SeqOthello: querying RNA-seq experiments at scale

Genome Biol. 2018 Oct 19;19(1):167. doi: 10.1186/s13059-018-1535-9.

Abstract

We present SeqOthello, an ultra-fast and memory-efficient indexing structure to support arbitrary sequence query against large collections of RNA-seq experiments. It takes SeqOthello only 5 min and 19.1 GB memory to conduct a global survey of 11,658 fusion events against 10,113 TCGA Pan-Cancer RNA-seq datasets. The query recovers 92.7% of tier-1 fusions curated by TCGA Fusion Gene Database and reveals 270 novel occurrences, all of which are present as tumor-specific. By providing a reference-free, alignment-free, and parameter-free sequence search system, SeqOthello will enable large-scale integrative studies using sequence-level data, an undertaking not previously practicable for many individual labs.

Keywords: Compression; Gene fusion; Othello; Pan-cancer; Query; RNA-seq; SeqOthello; TCGA.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Base Sequence
  • Gene Fusion
  • Humans
  • Neoplasms / genetics
  • Search Engine
  • Sequence Analysis, RNA / methods*
  • Software*