Supervised discretization can discover risk groups in cancer survival analysis

Comput Methods Programs Biomed. 2016 Nov:136:11-9. doi: 10.1016/j.cmpb.2016.08.006. Epub 2016 Aug 17.

Abstract

Discretization of continuous variables is a common practice in medical research to identify risk patient groups. This work compares the performance of gold-standard categorization procedures (TNM+A protocol) with that of three supervised discretization methods from Machine Learning (CAIM, ChiM and DTree) in the stratification of patients with breast cancer. The performance for the discretization algorithms was evaluated based on the results obtained after applying standard survival analysis procedures such as Kaplan-Meier curves, Cox regression and predictive modelling. The results show that the application of alternative discretization algorithms could lead the clinicians to get valuable information for the diagnosis and outcome of the disease. Patient data were collected from the Medical Oncology Service of the Hospital Clínico Universitario (Málaga, Spain) considering a follow up period from 1982 to 2008.

Keywords: Breast cancer free survival; CAIM; ChiMerge; Decision Trees; Predictive models; TNM protocol.

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Algorithms
  • Breast Neoplasms / pathology*
  • Female
  • Humans
  • Middle Aged
  • Spain
  • Survival Analysis*