Proper distance metrics for phylogenetic analysis using complete genomes without sequence alignment

Int J Mol Sci. 2010 Mar 18;11(3):1141-54. doi: 10.3390/ijms11031141.

Abstract

A shortcoming of most correlation distance methods based on the composition vectors without alignment developed for phylogenetic analysis using complete genomes is that the "distances" are not proper distance metrics in the strict mathematical sense. In this paper we propose two new correlation-related distance metrics to replace the old one in our dynamical language approach. Four genome datasets are employed to evaluate the effects of this replacement from a biological point of view. We find that the two proper distance metrics yield trees with the same or similar topologies as/to those using the old "distance" and agree with the tree of life based on 16S rRNA in a majority of the basic branches. Hence the two proper correlation-related distance metrics proposed here improve our dynamical language approach for phylogenetic analysis.

Keywords: complete genome; composition vector; correlation-related distance metric; phylogenetic analysis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Genome, Bacterial
  • Genome, Plant
  • Genomics / methods*
  • Phylogeny*
  • Sequence Alignment / methods*