Large-scale frequent stem pattern mining in RNA families

Jimmy Ka Ho Chiu; Tharam S Dillon; Yi-Ping Phoebe Chen

doi:10.1016/j.jtbi.2018.07.015

Large-scale frequent stem pattern mining in RNA families

J Theor Biol. 2018 Oct 14:455:131-139. doi: 10.1016/j.jtbi.2018.07.015. Epub 2018 Jul 20.

Authors

Jimmy Ka Ho Chiu¹, Tharam S Dillon², Yi-Ping Phoebe Chen³

Affiliations

¹ Department of Computer Science and Information, Technology, La Trobe University, Melbourne VIC 3086, Australia. Electronic address: jimmykhchiu@gmail.com.
² Department of Computer Science and Information, Technology, La Trobe University, Melbourne VIC 3086, Australia. Electronic address: tharam.dillon7@gmail.com.
³ Department of Computer Science and Information, Technology, La Trobe University, Melbourne VIC 3086, Australia. Electronic address: phoebe.chen@latrobe.edu.au.

PMID: 30036526
DOI: 10.1016/j.jtbi.2018.07.015

Abstract

Functionally similar non-coding RNAs are expected to be similar in certain regions of their secondary structures. These similar regions are called common structure motifs, and are structurally conserved throughout evolution to maintain their functional roles. Common structure motif identification is one of the critical tasks in RNA secondary structure analysis. Nevertheless, current approaches suffer several limitations, and/or do not scale with both structure size and the number of input secondary structures. In this work, we present a method to transform the conserved base pair stems into transaction items and apply frequent itemset mining to identify common structure motifs existing in a majority of input structures. Our experimental results on telomerase and ribosomal RNA secondary structures report frequent stem patterns that are of biological significance. Moreover, the algorithms utilized in our method are scalable and frequent stem patterns can be identified efficiently among many large structures.

Keywords: Algorithm design and analysis; Benchmark testing; NcRNA; RNA; RNA pseudoknot; Secondary structure; Topology.

MeSH terms

Algorithms*
Computer Simulation*
Nucleic Acid Conformation*
RNA / chemistry*
RNA / genetics
RNA, Ribosomal / chemistry*
RNA, Ribosomal / genetics
Sequence Analysis, RNA*
Telomerase / chemistry*
Telomerase / genetics

Substances

RNA, Ribosomal
telomerase RNA
RNA
Telomerase