Improve the Colorectal Cancer Diagnosis Using Gut Microbiome Data

Front Mol Biosci. 2022 Aug 12:9:921945. doi: 10.3389/fmolb.2022.921945. eCollection 2022.

Abstract

In the United States, colorectal cancer is the second largest cause of cancer death, and accurate early detection and identification of high-risk patients is a high priority. Although fecal screening tests are available, the close relationship between colorectal cancer and the gut microbiome has generated considerable interest. We describe a machine learning method for gut microbiome data to assist in diagnosing colorectal cancer. Our methodology integrates feature engineering, mediation analysis, statistical modeling, and network analysis into a novel unified pipeline. Simulation results illustrate the value of the method in comparison to existing methods. For predicting colorectal cancer in two real datasets, this pipeline showed an 8.7% higher prediction accuracy and 13% higher area under the receiver operator characteristic curve than other published work. Additionally, the approach highlights important colorectal cancer-related taxa for prioritization, such as high levels of Bacteroides fragilis, which can help elucidate disease pathology. Our algorithms and approach can be widely applied for Colorectal cancer prediction using either 16 S rRNA or shotgun metagenomics data.

Keywords: 16S rRNA; colorectal cancer; feature engineering; machine learning; mediation analysis; prediction.