Classification of various genomic sequences based on distribution of repeated k-word

Annu Int Conf IEEE Eng Med Biol Soc. 2017 Jul:2017:3894-3897. doi: 10.1109/EMBC.2017.8037707.

Abstract

In order to extract phylogenetic information from DNA sequences, alignment-free methods and alignment-based methods are used. Alignment-based methods have high complexity and conventional alignment-free methods have low accuracy. In this paper, a new alignment-free method based on the distribution of repeated k-word measure is proposed. This novel measure is based on k-words and its multiple repeated words. We can get higher performance than conventional word count methods in case of using proposed scheme while maintaining total time complexity. The proposed measure shows better performance compared to conventional alignment-free methods with respect to RF distance.

MeSH terms

  • Algorithms
  • Genome
  • Genomics*
  • Phylogeny
  • Sequence Alignment
  • Sequence Analysis, DNA