Estimate of the sequenced proportion of the global prokaryotic genome

Microbiome. 2020 Sep 16;8(1):134. doi: 10.1186/s40168-020-00903-z.

Abstract

Background: Sequencing prokaryotic genomes has revolutionized our understanding of the many roles played by microorganisms. However, the cell and taxon proportions of genome-sequenced bacteria or archaea on earth remain unknown. This study aimed to explore this basic question using large-scale alignment between the sequences released by the Earth Microbiome Project and 155,810 prokaryotic genomes from public databases.

Results: Our results showed that the median proportions of the genome-sequenced cells and taxa (at 100% identities in the 16S-V4 region) in different biomes reached 38.1% (16.4-86.3%) and 18.8% (9.1-52.6%), respectively. The sequenced proportions of the prokaryotic genomes in biomes were significantly negatively correlated with the alpha diversity indices, and the proportions sequenced in host-associated biomes were significantly higher than those in free-living biomes. Due to a set of cosmopolitan OTUs that are found in multiple samples and preferentially sequenced, only 2.1% of the global prokaryotic taxa are represented by sequenced genomes. Most of the biomes were occupied by a few predominant taxa with a high relative abundance and much higher genome-sequenced proportions than numerous rare taxa.

Conclusions: These results reveal the current situation of prokaryotic genome sequencing for earth biomes, provide a more reasonable and efficient exploration of prokaryotic genomes, and promote our understanding of microbial ecological functions. Video Abstract.

Keywords: Earth microbiome project; Genome sequencing; Microbiome; Predominant taxa; Prokaryotic biome.

Publication types

  • Research Support, Non-U.S. Gov't
  • Video-Audio Media

MeSH terms

  • Archaea / classification
  • Archaea / genetics
  • Archaea / isolation & purification
  • Bacteria / classification
  • Bacteria / genetics
  • Bacteria / isolation & purification
  • Databases, Genetic
  • Earth, Planet*
  • Genome / genetics*
  • Genomics / statistics & numerical data*
  • Microbiota / genetics*
  • Prokaryotic Cells / classification*
  • Prokaryotic Cells / metabolism*
  • Sequence Alignment
  • Sequence Analysis / statistics & numerical data*