Using an enlarged alphabet of K-tuples is the way to carry out alignment-free comparison of genomes in the composition vector (CV) approach to prokaryotic phylogeny. We summarize the known aspects concerning the choice of K and examine the results of using CVs with subtraction of a statistical background for K=3-9 and using raw CVs without subtraction for K=1-12. The criterion for evaluation consists in direct comparison with taxonomy. For prokaryotes the best performances are obtained for K=5 and 6 with subtraction and for K=11, 12 or even more without subtraction. In general, CVs with subtractions are slightly better and less CPU consuming, but CVs without subtraction may provide complementary information.
Keywords: Alignment-free; Composition vector; Prokaryote phylogeny and taxonomy; Subtraction procedure; Whole-genome-based.
Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.