Regularized approach for data missing not at random

Stat Methods Med Res. 2019 Jan;28(1):134-150. doi: 10.1177/0962280217717760. Epub 2017 Jul 3.

Abstract

It is common in longitudinal studies that missing data occur due to subjects' no response, missed visits, dropout, death or other reasons during the course of study. To perform valid analysis in this setting, data missing not at random (MNAR) have to be considered. However, models for data MNAR often suffer from the identifiability issue and hence result in difficulty in estimation and computational convergence. To ameliorate this issue, we propose the LASSO and ridge-regularized selection models that regularize the missing data mechanism model to handle data MNAR, with the regularization parameter selected via a cross-validation procedure. The proposed models can be also employed for sensitivity analysis to examine the effects on inference of different assumptions about the missing data mechanism. We illustrate the performance of the proposed models via simulation studies and the analysis of data from a randomized clinical trial.

Keywords: LASSO regression; Missing at random; pseudo likelihood; ridge regression; selection model.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cough / etiology
  • Data Accuracy*
  • Data Interpretation, Statistical*
  • Humans
  • Likelihood Functions
  • Longitudinal Studies
  • Models, Statistical
  • Patient Dropouts / statistics & numerical data
  • Regression Analysis
  • Scleroderma, Systemic / complications