Missing data: A statistical framework for practice

James R Carpenter; Melanie Smuk

doi:10.1002/bimj.202000196

Missing data: A statistical framework for practice

Biom J. 2021 Jun;63(5):915-947. doi: 10.1002/bimj.202000196. Epub 2021 Feb 24.

Authors

James R Carpenter^{1

2}, Melanie Smuk¹

Affiliations

¹ Department of Medical Statistics, London School of Hygiene & Tropical Medicine, London, UK.
² MRC Clinical Trials Unit at UCL, London, UK.

Abstract

Missing data are ubiquitous in medical research, yet there is still uncertainty over when restricting to the complete records is likely to be acceptable, when more complex methods (e.g. maximum likelihood, multiple imputation and Bayesian methods) should be used, how they relate to each other and the role of sensitivity analysis. This article seeks to address both applied practitioners and researchers interested in a more formal explanation of some of the results. For practitioners, the framework, illustrative examples and code should equip them with a practical approach to address the issues raised by missing data (particularly using multiple imputation), alongside an overview of how the various approaches in the literature relate. In particular, we describe how multiple imputation can be readily used for sensitivity analyses, which are still infrequently performed. For those interested in more formal derivations, we give outline arguments for key results, use simple examples to show how methods relate, and references for full details. The ideas are illustrated with a cohort study, a multi-centre case control study and a randomised clinical trial.

Keywords: complete records; missing data; multiple imputation; sensitivity analysis.

Publication types

Research Support, Non-U.S. Gov't
Review

MeSH terms

Bayes Theorem
Case-Control Studies*
Cohort Studies
Data Interpretation, Statistical
Humans
Uncertainty

Abstract

Publication types

MeSH terms

Grants and funding