Assessment of evaluation criteria for survival prediction from genomic data

Biom J. 2011 Mar;53(2):202-16. doi: 10.1002/bimj.201000048. Epub 2011 Feb 10.

Abstract

Survival prediction from high-dimensional genomic data is dependent on a proper regularization method. With an increasing number of such methods proposed in the literature, comparative studies are called for and some have been performed. However, there is currently no consensus on which prediction assessment criterion should be used for time-to-event data. Without a firm knowledge about whether the choice of evaluation criterion may affect the conclusions made as to which regularization method performs best, these comparative studies may be of limited value. In this paper, four evaluation criteria are investigated: the log-rank test for two groups, the area under the time-dependent ROC curve (AUC), an R²-measure based on the Cox partial likelihood, and an R²-measure based on the Brier score. The criteria are compared according to how they rank six widely used regularization methods that are based on the Cox regression model, namely univariate selection, principal components regression (PCR), supervised PCR, partial least squares regression, ridge regression, and the lasso. Based on our application to three microarray gene expression data sets, we find that the results obtained from the widely used log-rank test deviate from the other three criteria studied. For future studies, where one also might want to include non-likelihood or non-model-based regularization methods, we argue in favor of AUC and the R²-measure based on the Brier score, as these do not suffer from the arbitrary splitting into two groups nor depend on the Cox partial likelihood.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Area Under Curve
  • Breast Neoplasms / genetics
  • Breast Neoplasms / pathology
  • Gene Expression Regulation*
  • Humans
  • Lymphoma, Large B-Cell, Diffuse / genetics
  • Models, Statistical
  • Neuroblastoma / genetics
  • Oligonucleotide Array Sequence Analysis*
  • Polymerase Chain Reaction
  • Prognosis
  • Proportional Hazards Models
  • ROC Curve
  • Regression Analysis
  • Survival