Alignment free comparison: k word voting model and its applications

J Theor Biol. 2013 Oct 21:335:276-82. doi: 10.1016/j.jtbi.2013.06.037. Epub 2013 Jul 10.

Abstract

Alignment free sequence comparison is widely used in sequence analysis, especially in computational biology for large scale similarity comparison. In this paper, we propose a word voting model to compare the biological sequences without alignment. Unlike many comparison methods based on the k word, this model does not use the k word frequency or statistics. Thus there is no limitation on the choice of k. Instead, we used information entropy of gamma distribution to characterize the differences among biological sequences in this model. Finally, we employed the model to do the similarity search and phylogenetic tree construction to further validate the model.

Keywords: Gamma distribution; Information entropy; Large scale comparison.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Entropy
  • Models, Genetic*
  • Sequence Analysis, DNA / methods*