Statistical modeling of gut microbiota for personalized health status monitoring

Microbiome. 2023 Aug 18;11(1):184. doi: 10.1186/s40168-023-01614-x.

Abstract

Background: The gut microbiome is closely associated with health status, and any microbiota dysbiosis could considerably impact the host's health. In addition, many active consortium projects have generated many reference datasets available for large-scale retrospective research. However, a comprehensive monitoring framework that analyzes health status and quantitatively present bacteria-to-health contribution has not been thoroughly investigated.

Methods: We systematically developed a statistical monitoring diagram for personalized health status prediction and analysis. Our framework comprises three elements: (1) a statistical monitoring model was established, the health index was constructed, and the health boundary was defined; (2) healthy patterns were identified among healthy people and analyzed using contrast learning; (3) the contribution of each bacterium to the health index of the diseased population was analyzed. Furthermore, we investigated disease proximity using the contribution spectrum and discovered multiple multi-disease-related targets.

Results: We demonstrated and evaluated the effectiveness of the proposed monitoring framework for tracking personalized health status through comprehensive real-data analysis using the multi-study cohort and another validation cohort. A statistical monitoring model was developed based on 92 microbial taxa. In both the discovery and validation sets, our approach achieved balanced accuracies of 0.7132 and 0.7026, and AUC of 0.80 and 0.76, respectively. Four health patterns were identified in healthy populations, highlighting variations in species composition and metabolic function across these patterns. Furthermore, a reasonable correlation was found between the proposed health index and host physiological indicators, diversity, and functional redundancy. The health index significantly correlated with Shannon diversity ([Formula: see text]) and species richness ([Formula: see text]) in the healthy samples. However, in samples from individuals with diseases, the health index significantly correlated with age ([Formula: see text]), species richness ([Formula: see text]), and functional redundancy ([Formula: see text]). Personalized diagnosis is achieved by analyzing the contribution of each bacterium to the health index. We identified high-contribution species shared across multiple diseases by analyzing the contribution spectrum of these diseases.

Conclusions: Our research revealed that the proposed monitoring framework could promote a deep understanding of healthy microbiomes and unhealthy variations and served as a bridge toward individualized therapy target discovery and precise modulation. Video Abstract.

Keywords: Gut microbiome; Machine learning; Personalized health prediction; Principal component analysis; Statistical inference.

Publication types

  • Video-Audio Media
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Analysis
  • Gastrointestinal Microbiome* / genetics
  • Health Status
  • Humans
  • Microbiota*
  • Retrospective Studies