Which test for crossing survival curves? A user's guideline

Ina Dormuth; Tiantian Liu; Jin Xu; Menggang Yu; Markus Pauly; Marc Ditzhaus

doi:10.1186/s12874-022-01520-0

Which test for crossing survival curves? A user's guideline

BMC Med Res Methodol. 2022 Jan 30;22(1):34. doi: 10.1186/s12874-022-01520-0.

Authors

Ina Dormuth¹, Tiantian Liu², Jin Xu³, Menggang Yu⁴, Markus Pauly⁵, Marc Ditzhaus⁵

Affiliations

¹ TU Dortmund University, Joseph-von-Fraunhofer-Straße 2-4, 44221, Dortmund, Germany. Ina.dormuth@tu-dortmund.de.
² Technion - Israel Institute of Technology, Haifa, Israel.
³ East China Normal University, Shanghai, China.
⁴ University of Wisconsin-Madison, Madison, USA.
⁵ TU Dortmund University, Joseph-von-Fraunhofer-Straße 2-4, 44221, Dortmund, Germany.

Abstract

Background: The exchange of knowledge between statisticians developing new methodology and clinicians, reviewers or authors applying them is fundamental. This is specifically true for clinical trials with time-to-event endpoints. Thereby, one of the most commonly arising questions is that of equal survival distributions in two-armed trial. The log-rank test is still the gold-standard to infer this question. However, in case of non-proportional hazards, its power can become poor and multiple extensions have been developed to overcome this issue. We aim to facilitate the choice of a test for the detection of survival differences in the case of crossing hazards.

Methods: We restricted the review to the most recent two-armed clinical oncology trials with crossing survival curves. Each data set was reconstructed using a state-of-the-art reconstruction algorithm. To ensure reproduction quality, only publications with published number at risk at multiple time points, sufficient printing quality and a non-informative censoring pattern were included. This article depicts the p-values of the log-rank and Peto-Peto test as references and compares them with nine different tests developed for detection of survival differences in the presence of non-proportional or crossing hazards.

Results: We reviewed 1400 recent phase III clinical oncology trials and selected fifteen studies that met our eligibility criteria for data reconstruction. After including further three individual patient data sets, for nine out of eighteen studies significant differences in survival were found using the investigated tests. An important point that reviewers should pay attention to is that 28% of the studies with published survival curves did not report the number at risk. This makes reconstruction and plausibility checks almost impossible.

Conclusions: The evaluation shows that inference methods constructed to detect differences in survival in presence of non-proportional hazards are beneficial and help to provide guidance in choosing a sensible alternative to the standard log-rank test.

Keywords: Crossing; Log-rank test; Non-proportional hazards; Oncology; Restricted-mean survival; Survival analysis; Time-to-event outcome.

Publication types

Research Support, Non-U.S. Gov't
Review

MeSH terms

Clinical Trials, Phase III as Topic
Humans
Neoplasms* / diagnosis
Neoplasms* / therapy
Proportional Hazards Models
Research Design*
Survival Analysis