Trends between gene content and genome size in prokaryotic species with larger genomes

Proc Natl Acad Sci U S A. 2004 Mar 2;101(9):3160-5. doi: 10.1073/pnas.0308653100. Epub 2004 Feb 18.

Abstract

Although the evolution process and ecological benefits of symbiotic species with small genomes are well understood, these issues remain poorly elucidated for free-living species with large genomes. We have compared 115 completed prokaryotic genomes by using the Clusters of Orthologous Groups database to determine whether there are changes with genome size in the proportion of the genome attributable to particular cellular processes, because this may reflect both cellular and ecological strategies associated with genome expansion. We found that large genomes are disproportionately enriched in regulation and secondary metabolism genes and depleted in protein translation, DNA replication, cell division, and nucleotide metabolism genes compared to medium- and small-sized genomes. Furthermore, large genomes do not accumulate noncoding DNA or hypothetical ORFs, because the portion of the genome devoted to these functions remained constant with genome size. Traits other than genome size or strain-specific processes are reflected by the dispersion around the mean for cell functions that showed no correlation with genome size. For example, Archaea had significantly more genes in energy production, coenzyme metabolism, and the poorly characterized category, and fewer in cell membrane biogenesis and carbohydrate metabolism than Bacteria. The trends we noted with genome size by using Clusters of Orthologous Groups were confirmed by our independent analysis with The Institute for Genomic Research's Comprehensive Microbial Resource and Kyoto Encyclopedia of Genes and Genomes' Orthology annotation databases. These trends suggest that larger genome-sized species may dominate in environments where resources are scarce but diverse and where there is little penalty for slow growth, such as soil.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Cell Division / genetics
  • Cell Physiological Phenomena
  • DNA / genetics
  • DNA Replication / genetics
  • Energy Metabolism / genetics
  • Genome*
  • Genome, Bacterial
  • Metabolism / genetics
  • Models, Genetic
  • Nucleotides / metabolism
  • Open Reading Frames
  • Protein Biosynthesis

Substances

  • Nucleotides
  • DNA