Alignment free comparison: similarity distribution between the DNA primary sequences based on the shortest absent word

Lianping Yang; Xiangde Zhang; Hegui Zhu

doi:10.1016/j.jtbi.2011.11.021

Alignment free comparison: similarity distribution between the DNA primary sequences based on the shortest absent word

J Theor Biol. 2012 Feb 21:295:125-31. doi: 10.1016/j.jtbi.2011.11.021. Epub 2011 Dec 1.

Authors

Lianping Yang¹, Xiangde Zhang, Hegui Zhu

Affiliation

¹ College of Sciences, Northeastern University, Shenyang, China.

PMID: 22138094
DOI: 10.1016/j.jtbi.2011.11.021

Abstract

This work proposes an alignment free comparison model for the DNA primary sequences. In this paper, we treat the double strands of the DNA rather than single strand. We define the shortest absent word of the double strands between the DNA sequences and some properties are studied to speed up the algorithm for searching the shortest absent word. We present a novel model for comparison, in which the similarity distribution is introduced to describe the similarity between the sequences. A distance measure is deduced based on the Shannon entropy meanwhile is used in phylogenetic analysis. Some experiments show that our model performs well in the field of sequence analysis.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Animals
DNA, Mitochondrial / genetics
Entropy
Humans
Models, Genetic*
Phylogeny
Sequence Alignment
Sequence Analysis, DNA / methods*
Species Specificity

Substances

DNA, Mitochondrial