Development and evaluation of a colorectal cancer screening method using machine learning-based gut microbiota analysis

Cancer Med. 2022 Aug;11(16):3194-3206. doi: 10.1002/cam4.4671. Epub 2022 Mar 22.

Abstract

Accumulating evidence indicates that alterations of gut microbiota are associated with colorectal cancer (CRC). Therefore, the use of gut microbiota for the diagnosis of CRC has received attention. Recently, several studies have been conducted to detect the differences in the gut microbiota between healthy individuals and CRC patients using machine learning-based gut bacterial DNA meta-sequencing analysis, and to use this information for the development of CRC diagnostic model. However, to date, most studies had small sample sizes and/or only cross-validated using the training dataset that was used to create the diagnostic model, rather than validated using an independent test dataset. Since machine learning-based diagnostic models cause overfitting if the sample size is small and/or an independent test dataset is not used for validation, the reliability of these diagnostic models needs to be interpreted with caution. To circumvent these problems, here we have established a new machine learning-based CRC diagnostic model using the gut microbiota as an indicator. Validation using independent test datasets showed that the true positive rate of our CRC diagnostic model increased substantially as CRC progressed from Stage I to more than 60% for CRC patients more advanced than Stage II when the false positive rate was set around 8%. Moreover, there was no statistically significant difference in the true positive rate between samples collected in different cities or in any part of the colorectum. These results reveal the possibility of the practical application of gut microbiota-based CRC screening tests.

Keywords: biomarkers; colorectal cancer; next generation sequencing; screening.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Colorectal Neoplasms* / diagnosis
  • Colorectal Neoplasms* / microbiology
  • Early Detection of Cancer
  • Gastrointestinal Microbiome*
  • Humans
  • Machine Learning
  • Reproducibility of Results