MISL: Multiple imputation by super learning

Stat Methods Med Res. 2022 Oct;31(10):1904-1915. doi: 10.1177/09622802221104238. Epub 2022 Jun 5.

Abstract

Multiple imputation techniques are commonly used when data are missing, however, there are many options one can consider. Multivariate imputation by chained equations is a popular method for generating imputations but relies on specifying models when imputing missing values. In this work, we introduce multiple imputation by super learning, an update to the multivariate imputation by chained equations method to generate imputations with ensemble learning. Ensemble methodologies have recently gained attention for use in inference and prediction as they optimally combine a variety of user-specified parametric and non-parametric models and perform well when estimating complex functions, including those with interaction terms. Through two simulations we compare inferences made using the multiple imputation by super learning approach to those made with other commonly used multiple imputation methods and demonstrate multiple imputation by super learning as a superior option when considering characteristics such as bias, confidence interval coverage rate, and confidence interval width.

Keywords: Fully conditional specification; machine learning; missing data; multiple imputation; super learning.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Bias
  • Computer Simulation
  • Data Interpretation, Statistical
  • Research Design*