Recovering prokaryotic genomes from host-associated, short-read shotgun metagenomic sequencing data

Nat Protoc. 2021 May;16(5):2520-2541. doi: 10.1038/s41596-021-00508-2. Epub 2021 Apr 16.

Abstract

Recovering genomes from shotgun metagenomic sequence data allows detailed taxonomic and functional characterization of individual species or strains in a microbial community. Retrieving these metagenome-assembled genomes (MAGs) involves seven stages. First, low-quality bases, along with adapter and host sequences, are removed. Second, overlapping sequences are assembled to create longer contiguous fragments. Third, these fragments are clustered based on sequence composition and abundance. Fourth, these sequence clusters, or bins, undergo rounds of quality assessment and refinement to yield MAGs. The optional fifth stage is dereplication of MAGs to select representatives. Next, each MAG is taxonomically classified. The optional seventh stage is assessing the fraction of diversity that has been recovered. The output of this protocol is draft genomes, which can provide invaluable clues about uncultured organisms. This protocol takes ~1 week to run, depending on computational resources available, and requires prior experience with high-performance computing, shell script programming and Python.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Metagenomics / methods*
  • Prokaryotic Cells / metabolism*
  • Sequence Analysis, DNA*
  • Software
  • Time Factors