Alignment free comparison: similarity distribution between the DNA primary sequences based on the shortest absent word

J Theor Biol. 2012 Feb 21:295:125-31. doi: 10.1016/j.jtbi.2011.11.021. Epub 2011 Dec 1.

Abstract

This work proposes an alignment free comparison model for the DNA primary sequences. In this paper, we treat the double strands of the DNA rather than single strand. We define the shortest absent word of the double strands between the DNA sequences and some properties are studied to speed up the algorithm for searching the shortest absent word. We present a novel model for comparison, in which the similarity distribution is introduced to describe the similarity between the sequences. A distance measure is deduced based on the Shannon entropy meanwhile is used in phylogenetic analysis. Some experiments show that our model performs well in the field of sequence analysis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • DNA, Mitochondrial / genetics
  • Entropy
  • Humans
  • Models, Genetic*
  • Phylogeny
  • Sequence Alignment
  • Sequence Analysis, DNA / methods*
  • Species Specificity

Substances

  • DNA, Mitochondrial