Machine learning-based colorectal cancer prediction using global dietary data

BMC Cancer. 2023 Feb 10;23(1):144. doi: 10.1186/s12885-023-10587-x.

Abstract

Background: Colorectal cancer (CRC) is the third most commonly diagnosed cancer worldwide. Active health screening for CRC yielded detection of an increasingly younger adults. However, current machine learning algorithms that are trained using older adults and smaller datasets, may not perform well in practice for large populations.

Aim: To evaluate machine learning algorithms using large datasets accounting for both younger and older adults from multiple regions and diverse sociodemographics.

Methods: A large dataset including 109,343 participants in a dietary-based colorectal cancer ase study from Canada, India, Italy, South Korea, Mexico, Sweden, and the United States was collected by the Center for Disease Control and Prevention. This global dietary database was augmented with other publicly accessible information from multiple sources. Nine supervised and unsupervised machine learning algorithms were evaluated on the aggregated dataset.

Results: Both supervised and unsupervised models performed well in predicting CRC and non-CRC phenotypes. A prediction model based on an artificial neural network (ANN) was found to be the optimal algorithm with CRC misclassification of 1% and non-CRC misclassification of 3%.

Conclusions: ANN models trained on large heterogeneous datasets may be applicable for both younger and older adults. Such models provide a solid foundation for building effective clinical decision support systems assisting healthcare providers in dietary-related, non-invasive screening that can be applied in large studies. Using optimal algorithms coupled with high compliance to cancer screening is expected to significantly improve early diagnoses and boost the success rate of timely and appropriate cancer interventions.

Keywords: Colorectal cancer; Dietary information; Machine learning.

MeSH terms

  • Algorithms
  • Colorectal Neoplasms* / diagnosis
  • Colorectal Neoplasms* / epidemiology
  • Diet
  • Humans
  • Machine Learning*
  • Neural Networks, Computer