Efficient Recovery of Complete Gut Viral Genomes by Combined Short- and Long-Read Sequencing

Adv Sci (Weinh). 2024 Apr;11(13):e2305818. doi: 10.1002/advs.202305818. Epub 2024 Jan 19.

Abstract

Current metagenome assembled human gut phage catalogs contained mostly fragmented genomes. Here, comprehensive gut virome detection procedure is developed involving virus-like particle (VLP) enrichment from ≈500 g feces and combined sequencing of short- and long-read. Applied to 135 samples, a Chinese Gut Virome Catalog (CHGV) is assembled consisting of 21,499 non-redundant viral operational taxonomic units (vOTUs) that are significantly longer than those obtained by short-read sequencing and contained ≈35% (7675) complete genomes, which is ≈nine times more than those in the Gut Virome Database (GVD, ≈4%, 1,443). Interestingly, the majority (≈60%, 13,356) of the CHGV vOTUs are obtained by either long-read or hybrid assemblies, with little overlap with those assembled from only the short-read data. With this dataset, vast diversity of the gut virome is elucidated, including the identification of 32% (6,962) novel vOTUs compare to public gut virome databases, dozens of phages that are more prevalent than the crAssphages and/or Gubaphages, and several viral clades that are more diverse than the two. Finally, the functional capacities are also characterized of the CHGV encoded proteins and constructed a viral-host interaction network to facilitate future research and applications.

Keywords: crAssphage; gubaphage; gut virome; long‐read sequencing; pacBio sequel II; terminase; virus‐like particle.

MeSH terms

  • Bacteriophages* / genetics
  • Feces
  • Genome, Viral / genetics
  • Humans
  • Metagenome / genetics
  • Sequence Analysis