Machine Learning in Prediction of Second Primary Cancer and Recurrence in Colorectal Cancer

Int J Med Sci. 2020 Jan 15;17(3):280-291. doi: 10.7150/ijms.37134. eCollection 2020.

Abstract

Background: Colorectal cancer (CRC) is the third commonly diagnosed cancer worldwide. Recurrence of CRC (Re) and onset of a second primary malignancy (SPM) are important indicators in treating CRC, but it is often difficult to predict the onset of a SPM. Therefore, we used mechanical learning to identify risk factors that affect Re and SPM.

Patient and methods: CRC patients with cancer registry database at three medical centers were identified. All patients were classified based on Re or no recurrence (NRe) as well as SPM or no SPM (NSPM). Two classifiers, namely A Library for Support Vector Machines (LIBSVM) and Reduced Error Pruning Tree (REPTree), were applied to analyze the relationship between clinical features and Re and/or SPM category by constructing optimized models.

Results: When Re and SPM were evaluated separately, the accuracy of LIBSVM was 0.878 and that of REPTree was 0.622. When Re and SPM were evaluated in combination, the precision of models for SPM+Re, NSPM+Re, SPM+NRe, and NSPM+NRe was 0.878, 0.662, 0.774, and 0.778, respectively.

Conclusions: Machine learning can be used to rank factors affecting tumor Re and SPM. In clinical practice, routine checkups are necessary to ensure early detection of new tumors. The success of prediction and early detection may be enhanced in the future by applying "big data" analysis methods such as machine learning.

Keywords: colorectal cancer; machine learning; second primary malignancy.

MeSH terms

  • Colorectal Neoplasms / diagnosis*
  • Female
  • Humans
  • Machine Learning*
  • Male
  • Risk Factors
  • Support Vector Machine