Large-scale frequent stem pattern mining in RNA families

J Theor Biol. 2018 Oct 14:455:131-139. doi: 10.1016/j.jtbi.2018.07.015. Epub 2018 Jul 20.

Abstract

Functionally similar non-coding RNAs are expected to be similar in certain regions of their secondary structures. These similar regions are called common structure motifs, and are structurally conserved throughout evolution to maintain their functional roles. Common structure motif identification is one of the critical tasks in RNA secondary structure analysis. Nevertheless, current approaches suffer several limitations, and/or do not scale with both structure size and the number of input secondary structures. In this work, we present a method to transform the conserved base pair stems into transaction items and apply frequent itemset mining to identify common structure motifs existing in a majority of input structures. Our experimental results on telomerase and ribosomal RNA secondary structures report frequent stem patterns that are of biological significance. Moreover, the algorithms utilized in our method are scalable and frequent stem patterns can be identified efficiently among many large structures.

Keywords: Algorithm design and analysis; Benchmark testing; NcRNA; RNA; RNA pseudoknot; Secondary structure; Topology.

MeSH terms

  • Algorithms*
  • Computer Simulation*
  • Nucleic Acid Conformation*
  • RNA / chemistry*
  • RNA / genetics
  • RNA, Ribosomal / chemistry*
  • RNA, Ribosomal / genetics
  • Sequence Analysis, RNA*
  • Telomerase / chemistry*
  • Telomerase / genetics

Substances

  • RNA, Ribosomal
  • telomerase RNA
  • RNA
  • Telomerase