Diversity and taxonomic distribution of bacterial biosynthetic gene clusters predicted to produce compounds with therapeutically relevant bioactivities

J Ind Microbiol Biotechnol. 2023 Feb 17;50(1):kuad024. doi: 10.1093/jimb/kuad024.

Abstract

Bacteria have long been a source of natural products with diverse bioactivities that have been developed into therapeutics to treat human disease. Historically, researchers have focused on a few taxa of bacteria, mainly Streptomyces and other actinomycetes. This strategy was initially highly successful and resulted in the golden era of antibiotic discovery. The golden era ended when the most common antibiotics from Streptomyces had been discovered. Rediscovery of known compounds has plagued natural product discovery ever since. Recently, there has been increasing interest in identifying other taxa that produce bioactive natural products. Several bioinformatics studies have identified promising taxa with high biosynthetic capacity. However, these studies do not address the question of whether any of the products produced by these taxa are likely to have activities that will make them useful as human therapeutics. We address this gap by applying a recently developed machine learning tool that predicts natural product activity from biosynthetic gene cluster (BGC) sequences to determine which taxa are likely to produce compounds that are not only novel but also bioactive. This machine learning tool is trained on a dataset of BGC-natural product activity pairs and relies on counts of different protein domains and resistance genes in the BGC to make its predictions. We find that rare and understudied actinomycetes are the most promising sources for novel active compounds. There are also several taxa outside of actinomycetes that are likely to produce novel active compounds. We also find that most strains of Streptomyces likely produce both characterized and uncharacterized bioactive natural products. The results of this study provide guidelines to increase the efficiency of future bioprospecting efforts.

One-sentence summary: This paper combines several bioinformatics workflows to identify which genera of bacteria are most likely to produce novel natural products with useful bioactivities such as antibacterial, antitumor, or antifungal activity.

Keywords: Genome mining; Machine learning; Natural products.

MeSH terms

  • Actinobacteria* / genetics
  • Actinobacteria* / metabolism
  • Actinomyces / genetics
  • Biological Products* / metabolism
  • Biological Products* / pharmacology
  • Computational Biology
  • Humans
  • Multigene Family

Substances

  • Biological Products