DISCo-microbe: design of an identifiable synthetic community of microbes

PeerJ. 2020 Feb 27:8:e8534. doi: 10.7717/peerj.8534. eCollection 2020.

Abstract

Background: Microbiomes are extremely important for their host organisms, providing many vital functions and extending their hosts' phenotypes. Natural studies of host-associated microbiomes can be difficult to interpret due to the high complexity of microbial communities, which hinders our ability to track and identify individual members along with the many factors that structure or perturb those communities. For this reason, researchers have turned to synthetic or constructed communities in which the identities of all members are known. However, due to the lack of tracking methods and the difficulty of creating a more diverse and identifiable community that can be distinguished through next-generation sequencing, most such in vivo studies have used only a few strains.

Results: To address this issue, we developed DISCo-microbe, a program for the design of an identifiable synthetic community of microbes for use in in vivo experimentation. The program is composed of two modules; (1) create, which allows the user to generate a highly diverse community list from an input DNA sequence alignment using a custom nucleotide distance algorithm, and (2) subsample, which subsamples the community list to either represent a number of grouping variables, including taxonomic proportions, or to reach a user-specified maximum number of community members. As an example, we demonstrate the generation of a synthetic microbial community that can be distinguished through amplicon sequencing. The synthetic microbial community in this example consisted of 2,122 members from a starting DNA sequence alignment of 10,000 16S rRNA sequences from the Ribosomal Database Project. We generated simulated Illumina sequencing data from the constructed community and demonstrate that DISCo-microbe is capable of designing diverse communities with members distinguishable by amplicon sequencing. Using the simulated data we were able to recover sequences from between 97-100% of community members using two different post-processing workflows. Furthermore, 97-99% of sequences were assigned to a community member with zero sequences being misidentified. We then subsampled the community list using taxonomic proportions to mimic a natural plant host-associated microbiome, ultimately yielding a diverse community of 784 members.

Conclusions: DISCo-microbe can create a highly diverse community list of microbes that can be distinguished through 16S rRNA gene sequencing, and has the ability to subsample (i.e., design) the community for the desired number of members and taxonomic proportions. Although developed for bacteria, the program allows for any alignment input from any taxonomic group, making it broadly applicable. The software and data are freely available from GitHub (https://github.com/dlcarper/DISCo-microbe) and Python Package Index (PYPI).

Keywords: 16S rRNA; Constructed community; In vivo experimentation; Microbiome; Synthetic community; Taxonomic profiling.

Grants and funding

This research was sponsored by the Genomic Science Program, U.S. Department of Energy, Office of Science, Biological and Environmental Research as part of the Plant Microbe Interfaces Scientific Focus Area. Oak Ridge National Laboratory is managed by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR22725. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.