Non-ignorable missingness in logistic regression

Stat Med. 2017 Aug 30;36(19):3005-3021. doi: 10.1002/sim.7349. Epub 2017 Jun 2.

Abstract

Nonresponses and missing data are common in observational studies. Ignoring or inadequately handling missing data may lead to biased parameter estimation, incorrect standard errors and, as a consequence, incorrect statistical inference and conclusions. We present a strategy for modelling non-ignorable missingness where the probability of nonresponse depends on the outcome. Using a simple case of logistic regression, we quantify the bias in regression estimates and show the observed likelihood is non-identifiable under non-ignorable missing data mechanism. We then adopt a selection model factorisation of the joint distribution as the basis for a sensitivity analysis to study changes in estimated parameters and the robustness of study conclusions against different assumptions. A Bayesian framework for model estimation is used as it provides a flexible approach for incorporating different missing data assumptions and conducting sensitivity analysis. Using simulated data, we explore the performance of the Bayesian selection model in correcting for bias in a logistic regression. We then implement our strategy using survey data from the 45 and Up Study to investigate factors associated with worsening health from the baseline to follow-up survey. Our findings have practical implications for the use of the 45 and Up Study data to answer important research questions relating to health and quality-of-life. Copyright © 2017 John Wiley & Sons, Ltd.

Keywords: 45 and Up Study; Bayesian selection model; nonresponse; sensitivity analysis.

MeSH terms

  • Aged
  • Bayes Theorem
  • Bias*
  • Biometry / methods*
  • Computer Simulation
  • Female
  • Humans
  • Logistic Models*
  • Longitudinal Studies
  • Male
  • Middle Aged
  • New South Wales
  • Surveys and Questionnaires*