Metagenomics Databases for Bacteria

Methods Mol Biol. 2023:2649:55-67. doi: 10.1007/978-1-0716-3072-3_3.

Abstract

The booming sequencing technologies have turned metagenomics into a widely used tool for microbe-related studies, especially in the areas of clinical medicine and ecology. Accordingly, the toolkit of metagenomics data analysis is growing stronger to provide multiple approaches for solving various biological questions and understanding the component and function of microbiome. As part of the toolkit, metagenomics databases play a central role in the creation and maintenance of processed data such as definition of taxonomic classifications, annotation of gene functions, sequence alignment, and phylogenetic tree inference. The availability of a large quantity of high-quality bacterial genomic sequences contributes significantly to the construction and update of metagenomics databases, which constitute the core resource for metagenomics data analysis at various scales. This chapter presents the key concepts, technical options, and challenges for metagenomics projects as well as the curation processes and versatile functions for the four representative bacterial metagenomics databases, including Greengenes, SILVA, Ribosomal Database Project (RDP), and Genome Taxonomy Database (GTDB).

Keywords: Database; Genome Taxonomy Database (GTDB); Greengenes; Metagenomics; Ribosomal Database Project (RDP); SILVA.

MeSH terms

  • Bacteria / genetics
  • Databases, Genetic
  • Metagenomics*
  • Microbiota* / genetics
  • Phylogeny
  • RNA, Ribosomal, 16S / genetics

Substances

  • RNA, Ribosomal, 16S