Correcting conditional mean imputation for censored covariates and improving usability

Biom J. 2022 Jun;64(5):858-862. doi: 10.1002/bimj.202100250. Epub 2022 Feb 24.

Abstract

Missing data are often overcome using imputation, which leverages the entire dataset to replace missing values with informed placeholders. This method can be modified for censored data by also incorporating partial information from censored values. One such modification proposed by Atem et al. (2017, 2019a, 2019b) is conditional mean imputation where censored covariates are replaced by their conditional means given other fully observed information. These methods are robust to additional parametric assumptions on the censored covariate and utilize all available data, which is appealing. However, in implementing these methods, we discovered that these three articles provide nonequivalent formulas and, in fact, none is the correct formula for the conditional mean. Herein, we derive the correct form of the conditional mean and discuss the bias incurred when using the incorrect formulas. Furthermore, we note that even the correct formula can perform poorly for log hazard ratios far from 0${\mathbf {0}}$ . We also provide user-friendly R software, the imputeCensoRd package, to enable future researchers to tackle censored covariates correctly.

Keywords: r package; random censoring; reproducibility; survival analysis; trapezoidal rule.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bias
  • Computer Simulation
  • Models, Statistical*
  • Proportional Hazards Models