High-dimensional, massive sample-size Cox proportional hazards regression for survival analysis

Biostatistics. 2014 Apr;15(2):207-21. doi: 10.1093/biostatistics/kxt043. Epub 2013 Oct 4.

Abstract

Survival analysis endures as an old, yet active research field with applications that spread across many domains. Continuing improvements in data acquisition techniques pose constant challenges in applying existing survival analysis methods to these emerging data sets. In this paper, we present tools for fitting regularized Cox survival analysis models on high-dimensional, massive sample-size (HDMSS) data using a variant of the cyclic coordinate descent optimization technique tailored for the sparsity that HDMSS data often present. Experiments on two real data examples demonstrate that efficient analyses of HDMSS data using these tools result in improved predictive performance and calibration.

Keywords: Big data; Cox proportional hazards; Regularized regression; Survival analysis.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Adolescent
  • Adult
  • Breast Neoplasms / genetics
  • Calibration / standards
  • Child
  • Data Interpretation, Statistical*
  • Databases, Factual / statistics & numerical data
  • Databases, Genetic / statistics & numerical data
  • Humans
  • Proportional Hazards Models*
  • Survival Analysis*
  • Wounds and Injuries / mortality