Phylogenetic analysis of DNA sequences based on the generalized pseudo-amino acid composition

J Theor Biol. 2011 Jan 21;269(1):217-23. doi: 10.1016/j.jtbi.2010.10.027. Epub 2010 Oct 30.

Abstract

The main work of this paper is to propose a new theory and method, which is based on the idea of the pseudo-amino acid composition, for phylogenetic analysis of DNA primary sequences. In our method, we revise the part of the occurrence frequency of 20 amino acids in the method of the pseudo-amino acid composition by replacing the frequency of 16 dinucleotides. And we select eight LZ complexity factors of eight (0,1) sequences of a DNA primary sequence as PseAA components. Finally, we characterize a DNA sequence with a 24-dimensional vector. We reconstruct the phylogenetic trees of two datasets. The results show that our method is efficient and significant.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / genetics*
  • Animals
  • Base Sequence
  • DNA / genetics*
  • Databases, Nucleic Acid
  • Hepatitis E virus / genetics
  • Mammals / genetics
  • Phylogeny*
  • Sequence Analysis, DNA / methods*
  • Species Specificity

Substances

  • Amino Acids
  • DNA