A Bioinformatics Approach for Detecting Repetitive Nested Motifs using Pattern Matching

Evol Bioinform Online. 2016 Oct 30:12:247-251. doi: 10.4137/EBO.S40138. eCollection 2016.

Abstract

The identification of nested motifs in genomic sequences is a complex computational problem. The detection of these patterns is important to allow the discovery of transposable element (TE) insertions, incomplete reverse transcripts, deletions, and/or mutations. In this study, a de novo strategy for detecting patterns that represent nested motifs was designed based on exhaustive searches for pairs of motifs and combinatorial pattern analysis. These patterns can be grouped into three categories, motifs within other motifs, motifs flanked by other motifs, and motifs of large size. The methodology used in this study, applied to genomic sequences from the plant species Aegilops tauschii and Oryza sativa, revealed that it is possible to identify putative nested TEs by detecting these three types of patterns. The results were validated through BLAST alignments, which revealed the efficacy and usefulness of the new method, which is called Mamushka.

Keywords: exact sequence analysis; nested motifs; repetitive motifs; structural bioinformatics.