Automatic disease prediction from human gut metagenomic data using boosting GraphSAGE

BMC Bioinformatics. 2023 Mar 31;24(1):126. doi: 10.1186/s12859-023-05251-x.

Abstract

Background: The human microbiome plays a critical role in maintaining human health. Due to the recent advances in high-throughput sequencing technologies, the microbiome profiles present in the human body have become publicly available. Hence, many works have been done to analyze human microbiome profiles. These works have identified that different microbiome profiles are present in healthy and sick individuals for different diseases. Recently, several computational methods have utilized the microbiome profiles to automatically diagnose and classify the host phenotype.

Results: In this work, a novel deep learning framework based on boosting GraphSAGE is proposed for automatic prediction of diseases from metagenomic data. The proposed framework has two main components, (a). Metagenomic Disease graph (MD-graph) construction module, (b). Disease prediction Network (DP-Net) module. The graph construction module constructs a graph by considering each metagenomic sample as a node in the graph. The graph captures the relationship between the samples using a proximity measure. The DP-Net consists of a boosting GraphSAGE model which predicts the status of a sample as sick or healthy. The effectiveness of the proposed method is verified using real and synthetic datasets corresponding to diseases like inflammatory bowel disease and colorectal cancer. The proposed model achieved a highest AUC of 93%, Accuracy of 95%, F1-score of 95%, AUPRC of 95% for the real inflammatory bowel disease dataset and a best AUC of 90%, Accuracy of 91%, F1-score of 87% and AUPRC of 93% for the real colorectal cancer dataset.

Conclusion: The proposed framework outperforms other machine learning and deep learning models in terms of classification accuracy, AUC, F1-score and AUPRC for both synthetic and real metagenomic data.

Keywords: Deep learning; Disease prediction; Ensemble GNN; GraphSAGE; Machine learning; Metagenomics.

MeSH terms

  • Colorectal Neoplasms* / genetics
  • Humans
  • Inflammatory Bowel Diseases* / genetics
  • Machine Learning
  • Metagenome
  • Metagenomics / methods
  • Microbiota* / genetics