Repseek, a tool to retrieve approximate repeats from large DNA sequences

Bioinformatics. 2007 Jan 1;23(1):119-21. doi: 10.1093/bioinformatics/btl519. Epub 2006 Oct 11.

Abstract

Chromosomes or other long DNA sequences contain many highly similar repeated sub-sequences. While there are efficient methods for detecting strict repeats or detecting already characterized repeats, there is no software available for detecting approximate repeats in large DNA sequences allowing for weighted substitutions and indels in a coherent statistical framework. Here, we present an implementation of a two-steps method (seed detection followed by their extension) that detects those approximate repeats. Our method is computationally efficient enough to handle large sequences and is flexible enough to account for influencing factors, such as sequence-composition biases both at the seed detection and alignment levels.

Availability: http://wwwabi.snv.jussieu.fr/public/RepSeek/

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Base Sequence*
  • DNA / chemistry
  • Information Storage and Retrieval / methods*
  • Information Systems
  • Sequence Analysis, DNA
  • Software*
  • Tandem Repeat Sequences

Substances

  • DNA