Genome-Guided Discovery of Natural Products through Multiplexed Low-Coverage Whole-Genome Sequencing of Soil Actinomycetes on Oxford Nanopore Flongle

mSystems. 2021 Dec 21;6(6):e0102021. doi: 10.1128/mSystems.01020-21. Epub 2021 Nov 23.

Abstract

Genome mining is an important tool for discovery of new natural products; however, the number of publicly available genomes for natural product-rich microbes such as actinomycetes, relative to human pathogens with smaller genomes, is small. To obtain contiguous DNA assemblies and identify large (ca. 10 to greater than 100 kb) biosynthetic gene clusters (BGCs) with high GC (>70%) and high-repeat content, it is necessary to use long-read sequencing methods when sequencing actinomycete genomes. One of the hurdles to long-read sequencing is the higher cost. In the current study, we assessed Flongle, a recently launched platform by Oxford Nanopore Technologies, as a low-cost DNA sequencing option to obtain contiguous DNA assemblies and analyze BGCs. To make the workflow more cost-effective, we multiplexed up to four samples in a single Flongle sequencing experiment while expecting low-sequencing coverage per sample. We hypothesized that contiguous DNA assemblies might enable analysis of BGCs even at low sequencing depth. To assess the value of these assemblies, we collected high-resolution mass spectrometry data and conducted a multi-omics analysis to connect BGCs to secondary metabolites. In total, we assembled genomes for 20 distinct strains across seven sequencing experiments. In each experiment, 50% of the bases were in reads longer than 10 kb, which facilitated the assembly of reads into contigs with an average N50 value of 3.5 Mb. The programs antiSMASH and PRISM predicted 629 and 295 BGCs, respectively. We connected BGCs to metabolites for N,N-dimethyl cyclic-di-tryptophan, two novel lasso peptides, and three known actinomycete-associated siderophores, namely, mirubactin, heterobactin, and salinichelin. IMPORTANCE Short-read sequencing of GC-rich genomes such as those from actinomycetes results in a fragmented genome assembly and truncated biosynthetic gene clusters (often 10 to >100 kb long), which hinders our ability to understand the biosynthetic potential of a given strain and predict the molecules that can be produced. The current study demonstrates that contiguous DNA assemblies, suitable for analysis of BGCs, can be obtained through low-coverage, multiplexed sequencing on Flongle, which provides a new low-cost workflow ($30 to 40 per strain) for sequencing actinomycete strain libraries.

Keywords: actinomycetes; bioinformatics; biosynthetic gene cluster; cyclic dipeptide; genomics; glycopeptide; lasso peptide; mass spectrometry; metabolomics; multi-omics; natural products.