Comparing continuous and discrete analyses of breast cancer survival information

Genomics. 2016 Aug;108(2):78-83. doi: 10.1016/j.ygeno.2016.06.002. Epub 2016 Jun 14.

Abstract

Treatment of cancer is becoming increasingly personalized and biomarkers continue to be developed to refine treatment decisions. Tumour mRNA abundance data is commonly used to develop such biomarkers, often to predict patient survival. However, survival analyses present unique challenges and it is unknown whether analysing mRNA abundance information in a discrete or continuous manner yields different results. To address this, we analysed 1988 primary breast tumour transcriptomes. When compared univariately, approximately 60% of all genes showed differences between the discrete and continuous Cox proportional hazards models with q-value differences spanning four orders of magnitude for some genes. Further, hybrid models using both continuous and discrete data used to classify poor prognosis via random forest outperformed models using a single type of information. Thus some genes appear to continuously contribute to poor prognosis while others display threshold effects, and incorporating this into biomarker development is a key unexplored avenue.

Keywords: Biomarkers; Continuous data; Cox proportional hazards; Discrete data; Machine learning; Random forest; Survival analysis.

Publication types

  • Comparative Study

MeSH terms

  • Algorithms
  • Biomarkers, Tumor / genetics
  • Breast Neoplasms / genetics*
  • Breast Neoplasms / mortality
  • Female
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Neoplastic
  • Humans
  • Models, Statistical*
  • Prognosis
  • Proportional Hazards Models
  • RNA, Messenger / analysis*
  • RNA, Neoplasm / analysis*
  • Survival Analysis

Substances

  • Biomarkers, Tumor
  • RNA, Messenger
  • RNA, Neoplasm