Random forests for survival data: which methods work best and under what conditions?

Int J Biostat. 2024 Apr 24. doi: 10.1515/ijb-2023-0056. Online ahead of print.

Abstract

Few systematic comparisons of methods for constructing survival trees and forests exist in the literature. Importantly, when the goal is to predict a survival time or estimate a survival function, the optimal choice of method is unclear. We use an extensive simulation study to systematically investigate various factors that influence survival forest performance - forest construction method, censoring, sample size, distribution of the response, structure of the linear predictor, and presence of correlated or noisy covariates. In particular, we study 11 methods that have recently been proposed in the literature and identify 6 top performers. We find that all the factors that we investigate have significant impact on the methods' relative accuracy of point predictions of survival times and survival function estimates. We use our results to make recommendations for which methods to use in a given context and offer explanations for the observed differences in relative performance.

Keywords: point prediction; random forest; random survival forest; survival analysis; survival function estimation.