Machine Learning Based Metagenomic Prediction of Inflammatory Bowel Disease

Stud Health Technol Inform. 2021 Oct 27:285:165-170. doi: 10.3233/SHTI210591.

Abstract

In this study, we investigate faecal microbiota composition, in an attempt to evaluate performance of classification algorithms in identifying Inflammatory Bowel Disease (IBD) and its two types: Crohn's disease (CD) and ulcerative colitis (UC). From many investigated algorithms, a random forest (RF) classifier was selected for detailed evaluation in three-class (CD versus UC versus nonIBD) classification task and two binary (nonIBD versus IBD and CD versus UC) classification tasks. We dealt with class imbalance, performed extensive parameter search, dimensionality reduction and two-level classification. In three-class classification, our best model reaches F1 score of 91% in average, which confirms the strong connection of IBD and gastrointestinal microbiome. Among most important features in three-class classification are species Staphylococcus hominis, Porphyromonas endodontalis, Slackia piriformis and genus Bacteroidetes.

Keywords: feature selection; imbalance; machine learning; microbiome.

MeSH terms

  • Actinobacteria
  • Bacteroidetes
  • Colitis, Ulcerative* / diagnosis
  • Colitis, Ulcerative* / microbiology
  • Crohn Disease* / diagnosis
  • Crohn Disease* / microbiology
  • Gastrointestinal Microbiome*
  • Humans
  • Inflammatory Bowel Diseases* / diagnosis
  • Inflammatory Bowel Diseases* / microbiology
  • Machine Learning
  • Porphyromonas endodontalis
  • Staphylococcus hominis

Supplementary concepts

  • Slackia piriformis