On K-peptide length in composition vector phylogeny of prokaryotes

Comput Biol Chem. 2014 Dec:53 Pt A:166-73. doi: 10.1016/j.compbiolchem.2014.08.021. Epub 2014 Aug 20.

Abstract

Using an enlarged alphabet of K-tuples is the way to carry out alignment-free comparison of genomes in the composition vector (CV) approach to prokaryotic phylogeny. We summarize the known aspects concerning the choice of K and examine the results of using CVs with subtraction of a statistical background for K=3-9 and using raw CVs without subtraction for K=1-12. The criterion for evaluation consists in direct comparison with taxonomy. For prokaryotes the best performances are obtained for K=5 and 6 with subtraction and for K=11, 12 or even more without subtraction. In general, CVs with subtractions are slightly better and less CPU consuming, but CVs without subtraction may provide complementary information.

Keywords: Alignment-free; Composition vector; Prokaryote phylogeny and taxonomy; Subtraction procedure; Whole-genome-based.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Archaea / classification*
  • Archaea / genetics
  • Archaeal Proteins / chemistry
  • Archaeal Proteins / genetics
  • Bacteria / classification*
  • Bacteria / genetics
  • Bacterial Proteins / chemistry
  • Bacterial Proteins / genetics
  • Genome, Archaeal*
  • Genome, Bacterial*
  • Peptides / chemistry
  • Peptides / genetics
  • Phylogeny*
  • Sequence Analysis, DNA
  • Sequence Analysis, Protein

Substances

  • Archaeal Proteins
  • Bacterial Proteins
  • Peptides