Systematic evaluation of supervised classifiers for fecal microbiota-based prediction of colorectal cancer

Oncotarget. 2017 Feb 7;8(6):9546-9556. doi: 10.18632/oncotarget.14488.

Abstract

Predicting colorectal cancer (CRC) based on fecal microbiota presents a promising method for non-invasive screening of CRC, but the optimization of classification models remains an unaddressed question. The purpose of this study was to systematically evaluate the effectiveness of different supervised machine-learning models in predicting CRC in two independent eastern and western populations. The structures of intestinal microflora in feces in Chinese population (N = 141) were determined by 454 FLX pyrosequencing, and different supervised classifiers were employed to predict CRC based on fecal microbiota operational taxonomic unit (OTUs). As a result, Bayes Net and Random Forest displayed higher accuracies than other algorithms in both populations, although Bayes Net was found with a lower false negative rate than that of Random Forest. Gut microbiota-based prediction was more accurate than the standard fecal occult blood test (FOBT), and the combination of both approaches further improved the prediction accuracy. Moreover, when unclassified OTUs were used as input, the BayesDMNB text algorithm achieved higher accuracy in the Chinese population (AUC=0.994). Taken together, our results suggest that Bayes Net classification model combined with unclassified OTUs may present an accurate method for predicting CRC based on the compositions of gut microbiota.

Keywords: CRC; gut microbiota; prediction; supervised classifier.

MeSH terms

  • Aged
  • Algorithms
  • Bacteria / classification*
  • Bacteria / isolation & purification
  • Bacteriological Techniques
  • China
  • Colorectal Neoplasms / diagnosis
  • Colorectal Neoplasms / microbiology*
  • Feces / microbiology*
  • Female
  • France
  • Gastrointestinal Microbiome*
  • Gastrointestinal Tract / microbiology*
  • Humans
  • Male
  • Middle Aged
  • Occult Blood
  • Predictive Value of Tests
  • Prognosis
  • Reproducibility of Results
  • Risk Assessment
  • Risk Factors