ASSA: Fast identification of statistically significant interactions between long RNAs

J Bioinform Comput Biol. 2018 Feb;16(1):1840001. doi: 10.1142/S0219720018400012. Epub 2018 Jan 29.

Abstract

The discovery of thousands of long noncoding RNAs (lncRNAs) in mammals raises a question about their functionality. It has been shown that some of them are involved in post-transcriptional regulation of other RNAs and form inter-molecular duplexes with their targets. Sequence alignment tools have been used for transcriptome-wide prediction of RNA-RNA interactions. However, such approaches have poor prediction accuracy since they ignore RNA's secondary structure. Application of the thermodynamics-based algorithms to long transcripts is not computationally feasible on a large scale. Here, we describe a new computational pipeline ASSA that combines sequence alignment and thermodynamics-based tools for efficient prediction of RNA-RNA interactions between long transcripts. To measure the hybridization strength, the sum energy of all the putative duplexes is computed. The main novelty implemented in ASSA is the ability to quickly estimate the statistical significance of the observed interaction energies. Most of the functional hybridizations between long RNAs were classified as statistically significant. ASSA outperformed 11 other tools in terms of the Area Under the Curve on two out of four test sets. Additionally, our results emphasized a unique property of the [Formula: see text] repeats with respect to the RNA-RNA interactions in the human transcriptome. ASSA is available at https://sourceforge.net/projects/assa/.

Keywords: RNA–RNA interactions; hybridization energy, statistical significance; long noncoding RNAs (lncRNAs); natural antisense transcripts (NATs); post-transcriptional regulation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Humans
  • Mammals / genetics
  • Nucleic Acid Hybridization
  • RNA, Antisense / genetics
  • RNA, Long Noncoding / genetics*
  • Sequence Alignment / methods*
  • Sequence Alignment / statistics & numerical data
  • Thermodynamics
  • Transcriptome

Substances

  • RNA, Antisense
  • RNA, Long Noncoding