Multiple imputation of missing data under missing at random: compatible imputation models are not sufficient to avoid bias if they are mis-specified

Elinor Curnow; James R Carpenter; Jon E Heron; Rosie P Cornish; Stefan Rach; Vanessa Didelez; Malte Langeheine; Kate Tilling

doi:10.1016/j.jclinepi.2023.06.011

Multiple imputation of missing data under missing at random: compatible imputation models are not sufficient to avoid bias if they are mis-specified

J Clin Epidemiol. 2023 Aug:160:100-109. doi: 10.1016/j.jclinepi.2023.06.011. Epub 2023 Jun 19.

Authors

Elinor Curnow¹, James R Carpenter², Jon E Heron³, Rosie P Cornish³, Stefan Rach⁴, Vanessa Didelez⁵, Malte Langeheine⁶, Kate Tilling³

Affiliations

¹ Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK; Medical Research Council Integrative Epidemiology Unit at the University of Bristol, University of Bristol, Bristol, UK. Electronic address: elinor.curnow@bristol.ac.uk.
² Department of Medical Statistics, London School of Hygiene and Tropical Medicine, University of London, London, UK; Medical Research Council Clinical Trials Unit at University College London, University of London, London, UK.
³ Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK; Medical Research Council Integrative Epidemiology Unit at the University of Bristol, University of Bristol, Bristol, UK.
⁴ Department of Epidemiological Methods and Etiological Research, Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany.
⁵ Faculty of Mathematics/Computer Science, University of Bremen, Bremen, Germany; Department of Biometry and Data Management, Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany.
⁶ Department of Health and Consumer Protection, Senator for Health, Women and Consumer Protection, Bremen, Germany.

Abstract

Objectives: Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). Standard (default) MI procedures use simple linear covariate functions in the imputation model. We examine the bias that may be caused by acceptance of this default option and evaluate methods to identify problematic imputation models, providing practical guidance for researchers.

Study design and setting: Using simulation and real data analysis, we investigated how imputation model mis-specification affected MI performance, comparing results with complete records analysis (CRA). We considered scenarios in which imputation model mis-specification occurred because (i) the analysis model was mis-specified or (ii) the relationship between exposure and confounder was mis-specified.

Results: Mis-specification of the relationship between outcome and exposure, or between exposure and confounder, can cause biased CRA and MI estimates (in addition to any bias in the full-data estimate due to analysis model mis-specification). MI by predictive mean matching can mitigate model mis-specification. Methods for examining model mis-specification were effective in identifying mis-specified relationships.

Conclusion: When using MI methods that assume data are MAR, compatibility between the analysis and imputation models is necessary, but not sufficient to avoid bias. We propose a step-by-step procedure for identifying and correcting mis-specification of imputation models.

Keywords: Compatibility; Complete records analysis; Mis-specification; Missing data; Multiple imputation; Predictive mean matching.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Bias
Computer Simulation
Data Analysis*
Data Interpretation, Statistical
Humans
Research Design*

Abstract

Publication types

MeSH terms

Grants and funding