Prokaryote phylogeny without sequence alignment: from avoidance signature to composition distance

J Bioinform Comput Biol. 2004 Mar;2(1):1-19. doi: 10.1142/s0219720004000442.

Abstract

This is a review of a new and essentially simple method of inferring phylogenetic relationships from complete genome data without using sequence alignment. The method is based on counting the appearance frequency of oligopeptides of a fixed length (up to K = 6) in the collection of protein sequences of a species. It is a method without fine adjustment and choice of genes. Applied to prokaryotic genomes it has led to results comparable with the bacteriologists' systematics as reflected in the latest 2002 outline of the Bergey's Manual of Systematic Bacteriology. The method has also been used to compare chloroplast genomes and to the phylogeny of Coronaviruses including human SARS-CoV. A key point in our approach is subtraction of a random background from the original counts by using a Markov model of order K-2 in order to highlight the shaping role of natural selection. The implications of the subtraction procedure is specially analyzed and further development of the new approach is indicated.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Archaea / genetics*
  • Bacteria / genetics*
  • Base Sequence
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Bacterial / genetics*
  • Genome, Bacterial
  • Molecular Sequence Data
  • Phylogeny
  • Prokaryotic Cells
  • Sequence Alignment
  • Sequence Analysis, DNA / methods*