Sampling-based estimation for massive survival data with additive hazards model

Lulu Zuo; Haixiang Zhang; HaiYing Wang; Lei Liu

doi:10.1002/sim.8783

Sampling-based estimation for massive survival data with additive hazards model

Stat Med. 2021 Jan 30;40(2):441-450. doi: 10.1002/sim.8783. Epub 2020 Nov 3.

Authors

Lulu Zuo¹, Haixiang Zhang¹, HaiYing Wang², Lei Liu³

Affiliations

¹ Center for Applied Mathematics, Tianjin University, Tianjin, China.
² Department of Statistics, University of Connecticut, Mansfield, Connecticut, USA.
³ Division of Biostatistics, Washington University in St. Louis, St. Louis, Missouri, USA.

Abstract

For massive survival data, we propose a subsampling algorithm to efficiently approximate the estimates of regression parameters in the additive hazards model. We establish consistency and asymptotic normality of the subsample-based estimator given the full data. The optimal subsampling probabilities are obtained via minimizing asymptotic variance of the resulting estimator. The subsample-based procedure can largely reduce the computational cost compared with the full data method. In numerical simulations, our method has low bias and satisfactory coverage probabilities. We provide an illustrative example on the survival analysis of patients with lymphoma cancer from the Surveillance, Epidemiology, and End Results Program.

Keywords: additive hazards model; big data; subsample-based estimator; subsampling probabilities; survival analysis.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms*
Bias
Humans
Probability
Proportional Hazards Models
Survival Analysis

Grants and funding

UL1 TR002345/TR/NCATS NIH HHS/United States