SMART: SuperMaximal approximate repeats tool

Bioinformatics. 2020 Apr 15;36(8):2589-2591. doi: 10.1093/bioinformatics/btz953.

Abstract

Summary: State-of-the-art repeat analysis tools rely on extending maximal repeated pairs to enumerate maximal k-mismatch repeats. These pairs can be quadratic in n, the length of the input sequence, and thus greedy heuristics are applied to speed up the extension. Here, we introduce supermaximal k-mismatch repeats, which are linear in n and capture all maximal k-mismatch repeats: every maximal k-mismatch repeat is a substring of some supermaximal k-mismatch repeat. We present SMART, a tool based on recent algorithmic advances implemented in C++ to compute supermaximal k-mismatch repeats directly, and show that these elements are statistically much more significant than the output of the state-of-the-art.

Availability and implementation: http://github.com/lorrainea/smart (GNU GPL v3.0).

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't