Constructing multivariate prognostic gene signatures with censored survival data

Methods Mol Biol. 2013:972:85-101. doi: 10.1007/978-1-60327-337-4_6.

Abstract

Modern high-throughput technologies allow us to simultaneously measure the expressions of a huge number of candidate predictors, some of which are likely to be associated with survival. One difficult task is to search among an enormous number of potential predictors and to correctly identify most of the important ones, without mistakenly identifying too many spurious associations. Mere variable selection is insufficient, however, for the information from the multiple predictors must be intelligently combined and calibrated to form the final composite predictor. Many commonly used procedures overfit the training data, miss many important predictors, or both. Although it is impossible to simultaneously adjust for a huge number of predictors in an unconstrained way, we propose a method that offers a middle ground where some partial multivariate adjustments can be made in an adaptive fashion, regardless of the number of candidate predictors. We demonstrate the performance of our proposed procedure in a simulation study within the Cox proportional hazards regression framework, and we apply our new method to a publicly available data set to construct a novel prognostic gene signature for breast cancer survival.

Publication types

  • Research Support, N.I.H., Intramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Breast Neoplasms / genetics
  • Breast Neoplasms / mortality*
  • Computer Simulation
  • Data Interpretation, Statistical
  • Female
  • Gene Expression Profiling / methods
  • Humans
  • Likelihood Functions
  • Models, Statistical
  • Multivariate Analysis
  • Principal Component Analysis
  • Prognosis
  • Proportional Hazards Models
  • Transcriptome