Confidence intervals after multiple imputation: combining profile likelihood information from logistic regressions

Stat Med. 2013 Dec 20;32(29):5062-76. doi: 10.1002/sim.5899. Epub 2013 Jul 19.

Abstract

In the logistic regression analysis of a small-sized, case-control study on Alzheimer's disease, some of the risk factors exhibited missing values, motivating the use of multiple imputation. Usually, Rubin's rules (RR) for combining point estimates and variances would then be used to estimate (symmetric) confidence intervals (CIs), on the assumption that the regression coefficients were distributed normally. Yet, rarely is this assumption tested, with or without transformation. In analyses of small, sparse, or nearly separated data sets, such symmetric CI may not be reliable. Thus, RR alternatives have been considered, for example, Bayesian sampling methods, but not yet those that combine profile likelihoods, particularly penalized profile likelihoods, which can remove first order biases and guarantee convergence of parameter estimation. To fill the gap, we consider the combination of penalized likelihood profiles (CLIP) by expressing them as posterior cumulative distribution functions (CDFs) obtained via a chi-squared approximation to the penalized likelihood ratio statistic. CDFs from multiple imputations can then easily be averaged into a combined CDF c , allowing confidence limits for a parameter β at level 1 - α to be identified as those β* and β** that satisfy CDF c (β*) = α ∕ 2 and CDF c (β**) = 1 - α ∕ 2. We demonstrate that the CLIP method outperforms RR in analyzing both simulated data and data from our motivating example. CLIP can also be useful as a confirmatory tool, should it show that the simpler RR are adequate for extended analysis. We also compare the performance of CLIP to Bayesian sampling methods using Markov chain Monte Carlo. CLIP is available in the R package logistf.

Keywords: bias reduction; missing covariate values; non-convergence of maximum likelihood estimation; penalized likelihood; small samples.

MeSH terms

  • Aged
  • Aged, 80 and over
  • Alzheimer Disease / etiology
  • Case-Control Studies
  • Chi-Square Distribution*
  • Computer Simulation
  • Confidence Intervals*
  • Female
  • Humans
  • Likelihood Functions*
  • Logistic Models*
  • Male
  • Middle Aged
  • Regression Analysis*
  • Socioeconomic Factors