Multiply imputing missing values arising by design in transplant survival data

Biom J. 2020 Sep;62(5):1192-1207. doi: 10.1002/bimj.201800253. Epub 2020 Feb 20.

Abstract

In this article, we address a missing data problem that occurs in transplant survival studies. Recipients of organ transplants are followed up from transplantation and their survival times recorded, together with various explanatory variables. Due to differences in data collection procedures in different centers or over time, a particular explanatory variable (or set of variables) may only be recorded for certain recipients, which results in this variable being missing for a substantial number of records in the data. The variable may also turn out to be an important predictor of survival and so it is important to handle this missing-by-design problem appropriately. Consensus in the literature is to handle this problem with complete case analysis, as the missing data are assumed to arise under an appropriate missing at random mechanism that gives consistent estimates here. Specifically, the missing values can reasonably be assumed not to be related to the survival time. In this article, we investigate the potential for multiple imputation to handle this problem in a relevant study on survival after kidney transplantation, and show that it comprehensively outperforms complete case analysis on a range of measures. This is a particularly important finding in the medical context as imputing large amounts of missing data is often viewed with scepticism.

Keywords: missing data; multiple imputation; stepwise selection; survival analysis; transplant data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Collection*
  • Graft Survival
  • Humans
  • Research Design*
  • Transplantation* / mortality