A Goldilocks Principle for the Gut Microbiome: Taxonomic Resolution Matters for Microbiome-Based Classification of Colorectal Cancer

mBio. 2022 Feb 22;13(1):e0316121. doi: 10.1128/mbio.03161-21. Epub 2022 Jan 11.

Abstract

Colorectal cancer is a common and deadly disease in the United States accounting for over 50,000 deaths in 2020. This progressive disease is highly preventable with early detection and treatment, but many people do not comply with the recommended screening guidelines. The gut microbiome has emerged as a promising target for noninvasive detection of colorectal cancer. Most microbiome-based classification efforts utilize taxonomic abundance data from operational taxonomic units (OTUs) or amplicon sequence variants (ASVs) with the goal of increasing taxonomic resolution. However, it is unknown which taxonomic resolution is optimal for microbiome-based classification of colorectal cancer. To address this question, we used a reproducible machine learning framework to quantify classification performance of models based on data annotated to phylum, class, order, family, genus, OTU, and ASV levels. We found that model performance increased with increasing taxonomic resolution, up to the family level where performance was equal (P > 0.05) among family (mean area under the receiver operating characteristic curve [AUROC], 0.689), genus (mean AUROC, 0.690), and OTU (mean AUROC, 0.693) levels before decreasing at the ASV level (P < 0.05; mean AUROC, 0.676). These results demonstrate a trade-off between taxonomic resolution and prediction performance, where coarse taxonomic resolution (e.g., phylum) is not distinct enough, but fine resolution (e.g., ASV) is too individualized to accurately classify samples. Similar to the story of Goldilocks and the three bears (L. B. Cauley, Goldilocks and the Three Bears, 1981), mid-range resolution (i.e., family, genus, and OTU) is "just right" for optimal prediction of colorectal cancer from microbiome data. IMPORTANCE Despite being highly preventable, colorectal cancer remains a leading cause of cancer-related death in the United States. Low-cost, noninvasive detection methods could greatly improve our ability to identify and treat early stages of disease. The microbiome has shown promise as a resource for detection of colorectal cancer. Research on the gut microbiome tends to focus on improving our ability to profile species and strain level taxonomic resolution. However, we found that finer resolution impedes the ability to predict colorectal cancer based on the gut microbiome. These results highlight the need for consideration of the appropriate taxonomic resolution for microbiome analyses and that finer resolution is not always more informative.

Keywords: 16S rRNA gene sequencing; colon cancer; machine learning; microbiome; taxonomic level.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Bacteria / genetics
  • Colorectal Neoplasms*
  • Gastrointestinal Microbiome*
  • Humans
  • Microbiota*
  • RNA, Ribosomal, 16S

Substances

  • RNA, Ribosomal, 16S