Handling Missing Data in Instrumental Variable Methods for Causal Inference

Annu Rev Stat Appl. 2019 Mar;6(1):125-148. doi: 10.1146/annurev-statistics-031017-100353. Epub 2018 Nov 28.

Abstract

It is very common in instrumental variable studies for there to be missing instrument data. For example, in the Wisconsin Longitudinal Study one can use genotype data as a Mendelian randomization-style instrument, but this information is often missing when subjects do not contribute saliva samples, or when the genotyping platform output is ambiguous. Here we review missing-at-random assumptions one can use to identify instrumental variable causal effects, and discuss various approaches for estimation and inference. We consider likelihood-based methods, regression and weighting estimators, and doubly robust estimators. The likelihood-based methods yield the most precise inference, and are optimal under the model assumptions, while the doubly robust estimators can attain the nonparametric efficiency bound while allowing flexible nonparametric estimation of nuisance functions (e.g., instrument propensity scores). The regression and weighting estimators can sometimes be easiest to describe and implement. Our main contribution is an extensive review of this wide array of estimators under varied missing-at-random assumptions, along with discussion of asymptotic properties and inferential tools. We also implement many of the estimators in an analysis of the Wisconsin Longitudinal Study, to study effects of impaired cognitive functioning on depression.

Keywords: causal inference; instrumental variable; missing data; observational study; semiparametric efficiency.