Application of tetranucleotide frequencies for the assignment of genomic fragments

Environ Microbiol. 2004 Sep;6(9):938-47. doi: 10.1111/j.1462-2920.2004.00624.x.

Abstract

A basic problem of the metagenomic approach in microbial ecology is the assignment of genomic fragments to a certain species or taxonomic group, when suitable marker genes are absent. Currently, the (G + C)-content together with phylogenetic information and codon adaptation for functional genes is mostly used to assess the relationship of different fragments. These methods, however, can produce ambiguous results. In order to evaluate sequence-based methods for fragment identification, we extensively compared (G + C)-contents and tetranucleotide usage patterns of 9054 fosmid-sized genomic fragments generated in silico from 118 completely sequenced bacterial genomes (40 982 931 fragment pairs were compared in total). The results of this systematic study show that the discriminatory power of correlations of tetranucleotide-derived z-scores is by far superior to that of differences in (G + C)-content and provides reasonable assignment probabilities when applied to metagenome libraries of small diversity. Using six fully sequenced fosmid inserts from a metagenomic analysis of microbial consortia mediating the anaerobic oxidation of methane (AOM), we demonstrate that discrimination based on tetranucleotide-derived z-score correlations was consistent with corresponding data from 16S ribosomal RNA sequence analysis and allowed us to discriminate between fosmid inserts that were indistinguishable with respect to their (G + C)-contents.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacteria / genetics*
  • Base Composition
  • Databases, Genetic
  • Genome, Bacterial*
  • Genomics / methods*
  • Likelihood Functions
  • Nucleotides / genetics*
  • Species Specificity

Substances

  • Nucleotides