Machine Learning Predicts Biogeochemistry from Microbial Community Structure in a Complex Model System

Microbiol Spectr. 2022 Feb 23;10(1):e0190921. doi: 10.1128/spectrum.01909-21. Epub 2022 Feb 9.

Abstract

Microbial community structure is influenced by the environment and in turn exerts control on many environmental parameters. We applied this concept in a bioreactor study to test whether microbial community structure contains information sufficient to predict the concentration of H2S as the product of sulfate reduction. Microbial sulfate reduction is a major source of H2S in many industrial and environmental systems and is often influenced by the existing physicochemical conditions. Production of H2S in industrial systems leads to occupational hazards and adversely affects the quality of products. A long-term (148 days) experiment was conducted in upflow bioreactors to mimic sulfidogenesis, followed by inhibition with nitrate salts and a resumption of H2S generation when inhibition was released. We determined microbial community structure in 731 samples across 20 bioreactors using 16S rRNA gene sequencing and applied a random forest algorithm to successfully predict different phases of sulfidogenesis and mitigation (accuracy = 93.17%) and sessile and effluent microbial communities (accuracy = 100%). Similarly derived regression models that also included cell abundances were able to predict H2S concentration with remarkably high fidelity (R2 > 0.82). Metabolic profiles based on microbial community structure were also found to be reliable predictors for H2S concentration (R2 = 0.78). These results suggest that microbial community structure contains information sufficient to predict sulfidogenesis in a closed system, with anticipated applications to microbially driven processes in open environments. IMPORTANCE Microbial communities control many biogeochemical processes. Many of these processes are impractical or expensive to measure directly. Because the taxonomic structure of the microbial community is indicative of its function, it encodes information that can be used to predict biogeochemistry. Here, we demonstrate how a machine learning technique can be used to predict sulfidogenesis, a key biogeochemical process in a model system. A distinction of this research was the ability to predict H2S production in a bioreactor from the effluent bacterial community structure without direct observations of the sessile community or other environmental conditions. This study establishes the ability to use machine learning approaches in predicting sulfide concentrations in a closed system, which can be further developed as a valuable tool for predicting biogeochemical processes in open environments. As machine learning algorithms continue to improve, we anticipate increased applications of microbial community structure to predict key environmental and industrial processes.

Keywords: biogeochemical state; machine learning; microbial community analysis; random forest; sulfidogenesis potential.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bioreactors / microbiology
  • Forests*
  • Machine Learning*
  • Microbiota / genetics
  • Microbiota / physiology*
  • Models, Biological*
  • RNA, Ribosomal, 16S / genetics
  • Trees / microbiology*

Substances

  • RNA, Ribosomal, 16S