A review of statistical and machine learning methods for modeling cancer risk using structured clinical data

Artif Intell Med. 2018 Aug:90:1-14. doi: 10.1016/j.artmed.2018.06.002. Epub 2018 Jul 14.

Abstract

Advancements are constantly being made in oncology, improving prevention and treatment of cancers. To help reduce the impact and deadliness of cancers, they must be detected early. Additionally, there is a risk of cancers recurring after potentially curative treatments are performed. Predictive models can be built using historical patient data to model the characteristics of patients that developed cancer or relapsed. These models can then be deployed into clinical settings to determine if new patients are at high risk for cancer development or recurrence. For large-scale predictive models to be built, structured data must be captured for a wide range of diverse patients. This paper explores current methods for building cancer risk models using structured clinical patient data. Trends in statistical and machine learning techniques are explored, and gaps are identified for future research. The field of cancer risk prediction is a high-impact one, and research must continue for these models to be embraced for clinical decision support of both practitioners and patients.

Keywords: Cancer prediction; Cancer recurrence; Cancer relapse; Data mining; Electronic health records; Machine learning.

Publication types

  • Review

MeSH terms

  • Clinical Decision-Making
  • Data Interpretation, Statistical
  • Data Mining / methods*
  • Data Mining / statistics & numerical data
  • Decision Support Techniques*
  • Decision Trees
  • Diagnosis, Computer-Assisted / methods*
  • Early Detection of Cancer / methods*
  • Early Detection of Cancer / statistics & numerical data
  • Electronic Health Records* / statistics & numerical data
  • Humans
  • Machine Learning*
  • Neoplasm Staging
  • Neoplasms / diagnosis*
  • Neoplasms / epidemiology
  • Neoplasms / therapy
  • Nomograms
  • Recurrence
  • Risk Assessment
  • Risk Factors