A novel colorectal cancer screening framework with feature interpretability to identify high-risk populations for colonoscopy

J Gastroenterol Hepatol. 2024 May 14. doi: 10.1111/jgh.16600. Online ahead of print.

Abstract

Background and aim: Risk assessment is of paramount importance for the detection and treatment of colorectal cancer. We developed and validated a feature interpretability screening framework to identify high-risk populations and recommend colonoscopy for them.

Methods: We utilized a training cohort consisting of 1 252 605 participants who underwent colonoscopies in Shanghai from 2013 to 2015 to develop the screening framework. We incorporated Shapley additive explanation values into feature selection to provide interpretability for the framework. Two sampling methods were separately employed to mitigate potential model bias caused by class imbalance. Furthermore, we employed various machine learning algorithms to construct risk assessment models and compared their performance. We tested the screening models on an external validation cohort of 359 462 samples and conducted comprehensive evaluation and statistical analysis of the validation results.

Results: The external validation results demonstrated that the models in the proposed framework achieved sensitivity over 0.734, specificity over 0.790, and area under the receiver operating characteristic curve ranging from 0.808 to 0.859. In the predictions of the best-performing model, the prevalence rates of colorectal cancer were 0.059% and 1.056% in the low- and high-risk groups, respectively. If colonoscopies were performed only on the high-risk group predicted by the model, only 14.36% of total colonoscopies would be needed to detect 74.86% of colorectal cancer cases.

Conclusions: We developed and validated a novel framework to identify populations at high risk for colorectal cancer. Those classified as high risk should undergo colonoscopy for further diagnosis.

Keywords: Class imbalance; Colorectal cancer; Feature interpretability; Machine learning.