DeepRiPP integrates multiomics data to automate discovery of novel ribosomally synthesized natural products

Proc Natl Acad Sci U S A. 2020 Jan 7;117(1):371-380. doi: 10.1073/pnas.1901493116. Epub 2019 Dec 23.

Abstract

Microbial natural products represent a rich resource of evolved chemistry that forms the basis for the majority of pharmacotherapeutics. Ribosomally synthesized and posttranslationally modified peptides (RiPPs) are a particularly interesting class of natural products noted for their unique mode of biosynthesis and biological activities. Analyses of sequenced microbial genomes have revealed an enormous number of biosynthetic loci encoding RiPPs but whose products remain cryptic. In parallel, analyses of bacterial metabolomes typically assign chemical structures to only a minority of detected metabolites. Aligning these 2 disparate sources of data could provide a comprehensive strategy for natural product discovery. Here we present DeepRiPP, an integrated genomic and metabolomic platform that employs machine learning to automate the selective discovery and isolation of novel RiPPs. DeepRiPP includes 3 modules. The first, NLPPrecursor, identifies RiPPs independent of genomic context and neighboring biosynthetic genes. The second module, BARLEY, prioritizes loci that encode novel compounds, while the third, CLAMS, automates the isolation of their corresponding products from complex bacterial extracts. DeepRiPP pinpoints target metabolites using large-scale comparative metabolomics analysis across a database of 10,498 extracts generated from 463 strains. We apply the DeepRiPP platform to expand the landscape of novel RiPPs encoded within sequenced genomes and to discover 3 novel RiPPs, whose structures are exactly as predicted by our platform. By building on advances in machine learning technologies, DeepRiPP integrates genomic and metabolomic data to guide the isolation of novel RiPPs in an automated manner.

Keywords: RiPPs; genome mining; machine learning; metabolomics; natural products.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacteria / genetics
  • Bacteria / metabolism
  • Bacterial Proteins / biosynthesis
  • Bacterial Proteins / genetics
  • Bacterial Proteins / isolation & purification*
  • Biological Products / isolation & purification*
  • Biological Products / metabolism
  • Drug Discovery / methods*
  • Genomics / methods
  • Machine Learning
  • Metabolomics / methods
  • Peptide Biosynthesis / genetics
  • Peptides / genetics
  • Peptides / isolation & purification*
  • Peptides / metabolism
  • Protein Processing, Post-Translational
  • Ribosomes / metabolism
  • Software*

Substances

  • Bacterial Proteins
  • Biological Products
  • Peptides