Codon-based encoding for DNA sequence analysis

Methods. 2014 Jun 1;67(3):373-9. doi: 10.1016/j.ymeth.2014.01.016. Epub 2014 Feb 13.

Abstract

With the exponential growth of biological sequence data (DNA or Protein Sequence), DNA sequence analysis has become an essential task for biologist to understand the features, functions, structures, and evolution of species. Encoding DNA sequences is an effective method to extract the features from DNA sequences. It is commonly used for visualizing DNA sequences and analyzing similarities/dissimilarities between different species or cells. Although there have been many encoding approaches proposed for DNA sequence analysis, we require more elegant approaches for higher accuracy. In this paper, we propose a noble encoding approach for measuring the degree of similarity/dissimilarity between different species. Our approach can preserve the physiochemical properties, positional information, and the codon usage bias of nucleotides. An extensive performance study shows that our approach provides higher accuracy than existing approaches in terms of the degree of similarity.

Keywords: Codon; DNA visulization; Encoding DNA sequence; Sequence similarity.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Codon*
  • DNA Mutational Analysis
  • Phylogeny
  • Sequence Analysis, DNA / methods*

Substances

  • Codon