Empirical Comparison of Continuous and Discrete-time Representations for Survival Prediction

Michael Sloma; Fayeq Jeelani Syed; Mohammadreza Nemati; Kevin S Xu

Empirical Comparison of Continuous and Discrete-time Representations for Survival Prediction

Proc Mach Learn Res. 2021 Mar:146:118-131.

Authors

Michael Sloma¹, Fayeq Jeelani Syed¹, Mohammadreza Nemati¹, Kevin S Xu¹

Affiliation

¹ Electrical Engineering and Computer Science Department, University of Toledo 2801 W. Bancroft St. MS 308, Toledo, OH 43606-3390, USA.

PMID: 34179790
PMCID: PMC8232898

Abstract

Survival prediction aims to predict the time of occurrence of a particular event of interest, such as the time until a patient dies. The main challenge in survival prediction is the presence of incomplete observations due to censoring. The classical formulation for survival prediction treats the survival time as a continuous outcome, which leads to a censored regression problem. Recent work has reformulated the survival prediction problem by discretizing time into a finite number of bins and then applying multi-task binary classification. While the discrete-time formulation is convenient and potentially requires less assumptions than the continuous-time approach, it also loses information by discretizing time. In this paper, we empirically investigate continuous and discrete-time representations for survival prediction to try to quantify the trade-offs between the two formulations. We find that discretizing time does not necessarily decrease prediction accuracy. Furthermore, discrete-time models can result in even more accurate predictors than continuous-time models, but the number of time bins used for discretization has a significant effect on accuracy and should thus be tuned as a hyperparameter rather than specified for convenience.

Keywords: Cox proportional hazards model; Survival analysis; censored regression; multi-task learning; multi-task logistic regression.

Grants and funding

R01 LM013311/LM/NLM NIH HHS/United States