Using multiple imputation to address the inconsistent distribution of a controlling variable when modeling an infrequent outcome

J Mod Appl Stat Methods. 2017;16(1):744-752. doi: 10.22237/jmasm/1493599140.

Abstract

Temporal changes in methods for collecting longitudinal data can generate inconsistent distributions of affected variables, but effects on parameter estimates have not been well described. We examined differences in Apgar scores of infants born in 2000-2006 to women with ovulatory dysfunction (risk) or tubal obstruction (reference) who underwent assisted reproductive technology (ART), using Florida, Massachusetts, and Michigan birth certificate data linked to the Centers for Disease Control and Prevention's National ART Surveillance System database. Florida had inconsistent information on induction of labor (a control variable) from a 2004 change in birth certificate format. Because we wanted to control for bias that may be introduced by the inconsistent distribution of labor induction in analysis, we used multiple imputation data in analysis. We used Cox-Iannacchione weighted sequential hot deck method to conduct multiple imputation for the labor induction values in Florida data collected before this change, and missing values in Florida data collected after the change and overall Massachusetts and Michigan data. The adjusted odds ratios for low Apgar score were 1.94 (95% confidence interval [CI] 1.32-2.85) using imputed induction of labor and 1.83 (95% CI 1.20-2.80) using not imputed induction of labor. Compared with the estimate from multiple imputation, the estimate obtained using not imputed induction of labor was biased towards the null with inflated standard errors, but the magnitude of differences was small.

Keywords: assisted reproductive technology; inconsistent data distribution; multiple imputation; weighted sequential hot deck.