Empirical Comparison of Continuous and Discrete-time Representations for Survival Prediction

Proc Mach Learn Res. 2021 Mar:146:118-131.

Abstract

Survival prediction aims to predict the time of occurrence of a particular event of interest, such as the time until a patient dies. The main challenge in survival prediction is the presence of incomplete observations due to censoring. The classical formulation for survival prediction treats the survival time as a continuous outcome, which leads to a censored regression problem. Recent work has reformulated the survival prediction problem by discretizing time into a finite number of bins and then applying multi-task binary classification. While the discrete-time formulation is convenient and potentially requires less assumptions than the continuous-time approach, it also loses information by discretizing time. In this paper, we empirically investigate continuous and discrete-time representations for survival prediction to try to quantify the trade-offs between the two formulations. We find that discretizing time does not necessarily decrease prediction accuracy. Furthermore, discrete-time models can result in even more accurate predictors than continuous-time models, but the number of time bins used for discretization has a significant effect on accuracy and should thus be tuned as a hyperparameter rather than specified for convenience.

Keywords: Cox proportional hazards model; Survival analysis; censored regression; multi-task learning; multi-task logistic regression.