Blood Biomarkers Panels for Screening of Colorectal Cancer and Adenoma on a Machine Learning-Assisted Detection Platform

Cancer Control. 2023 Jan-Dec:30:10732748231222109. doi: 10.1177/10732748231222109.

Abstract

Objective: A mini-invasive and good-compliance program is critical to broaden colorectal cancer (CRC) screening and reduce CRC-related mortality. Blood testing combined with imaging examination has been proved to be feasible on screen for multicancer and guide intervention. The study aims to construct a machine learning-assisted detection platform with available multi-targets for CRC and colorectal adenoma (CRA) screening.

Methods: This was a retrospective study that the blood test data from 204 CRCs, 384 CRAs, and 229 healthy controls was extracted. The classified models were constructed with 4 machine learning (ML) algorithms including support vector machine (SVM), random forest (RF), decision tree (DT), and eXtreme Gradient Boosting (XGB) based on the candidate biomarkers. The importance index was used by SHapely Adaptive exPlanations (SHAP) analysis to identify the dominant characteristics. The performance of classified models was evaluated. The most dominating features from the proposed panel were developed by logistic regression (LR) for identification CRC from control.

Results: The candidate biomarkers consisted of 26 multi-targets panel including CEA, AFP, and so on. Among the 4 models, the SVM classifier for CRA yields the best predictive performance (the area under the receiver operating curve, AUC: .925, sensitivity: .904, and specificity: .771). As for CRC classification, the RF model with 26 candidate biomarkers provided the best predictive parameters (AUC: .941, sensitivity: .902, and specificity: .912). Compared with CEA and CA199, the predictive performance was significantly improved. The streamlined model with 6 biomarkers for CRC also obtained a good performance (AUC: .946, sensitivity: .885, and specificity: .913).

Conclusions: The predictive models consisting of 26 multi-targets panel would be used as a non-invasive, economical, and effective risk stratification platform, which was expected to be applied for auxiliary screening of CRA and CRC in clinical practice.

Keywords: colorectal adenoma; colorectal cancer; eXtreme gradient boosting; machine learning algorithms; random forest; support vector machine.

MeSH terms

  • Adenoma* / diagnosis
  • Biomarkers
  • Colorectal Neoplasms* / diagnosis
  • Early Detection of Cancer
  • Humans
  • Machine Learning
  • Retrospective Studies

Substances

  • Biomarkers