Large-scale genome clustering across life based on a linguistic approach

Biosystems. 2005 Sep;81(3):208-22. doi: 10.1016/j.biosystems.2005.04.003.

Abstract

With the availability of genome sequences, the possibility of new phylogenetic reconstructions arises in order to reveal genomic relationships among organisms. According to the compositional-spectra (CS) approach proposed in our previous studies, any genomic sequence can be characterized by a distribution of frequencies of imperfect matching of words (oligonucleotides). In the current application of CS-analysis, we attempted to analyze the cluster structure of genomes across life. It appeared that compositional spectra show a clear three-group clustering of the compared prokaryotic and eukaryotic genomes. Unexpectedly, this grouping seriously differs from the classical Universal Tree of Life structure represented by common kingdoms known as Eubacteria, Archaebacteria, and Eukarya. The revealed CS-clustering displays high stability, putatively reflecting its objective nature, and still enigmatic biological significance that may result from convergent evolution driven by ecological selection. We believe that our approach provides a new and wider (compared to traditional methods) perspective of extracting genomic information of high evolutionary relevance.

Publication types

  • Comparative Study

MeSH terms

  • Base Composition
  • Base Sequence / genetics
  • Classification / methods*
  • Cluster Analysis
  • Computational Biology / methods
  • Genome / genetics*
  • Genomics / methods*
  • Oligonucleotides / genetics*
  • Phylogeny*
  • Species Specificity

Substances

  • Oligonucleotides