Optimal subsampling for parametric accelerated failure time models with massive survival data

Stat Med. 2022 Nov 30;41(27):5421-5431. doi: 10.1002/sim.9576. Epub 2022 Sep 20.

Abstract

With increasing availability of massive survival data, researchers need valid statistical inferences for survival modeling whose computation is not limited by computer memories. Existing works focus on relative risk models using the online updating and divide-and-conquer strategies. The subsampling strategy has not been available due to challenges in developing the asymptotic properties of the estimator under semiparametric models with censored data. This article tackles optimal subsampling algorithms to fast approximate the maximum likelihood estimator for parametric accelerate failure time models with massive survival data. We derive the asymptotic distributions of the subsampling estimator and the optimal sampling probabilities that minimize the asymptotic mean squared error of the estimator. A feasible two-step algorithm is proposed where the optimal sampling probabilities in the second step are estimated based on a pilot sample in the first step. The asymptotic properties of the two-step estimator are established. The performance of the estimator is validated in a simulation study. A real data analysis illustrates the usefulness of the methods.

Keywords: A-optimality; L-optimality; censoring; survival analysis.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Computer Simulation
  • Data Analysis*
  • Humans
  • Models, Statistical
  • Probability
  • Survival Analysis
  • Time Factors