Do Missing Values Influence Outcomes in a Cross-sectional Mail Survey?

Mayo Clin Proc Innov Qual Outcomes. 2021 Jan 19;5(1):84-93. doi: 10.1016/j.mayocpiqo.2020.09.006. eCollection 2021 Feb.

Abstract

Objective: To determine the effects of missing and inconsistent data on a weight management mail survey results.

Patients and methods: Weight management surveys were sent to 5000 overweight and obese individuals in the Learning Health System Network. Survey information was collected between October 27, 2017, and March 1, 2018. Some participants reported body mass index (BMI) values inconsistent with the intended overweight and obese sampling cohort. Analyses were performed after excluding these surveys and also performed again after setting these low BMI values to missing. Models were run after imputing missing values using expectation-maximization, Markov chain Monte Carlo, random forest imputation, multivariate imputation by chained equations, and multiple imputation and replacing missing BMI values with the minimum, maximum, mean, or median of the known BMI values.

Results: Of 2799 surveys, 222 (8%) had missing BMI values and 155 (6%) reported invalid BMI values. Overall, 725 of these 2799 surveys (26%) were missing at least 1 variable that was essential to the main analyses. Different imputation methods consistently found that BMI was related to age, sex, race, marital status, and education. Patients with a BMI of 35.0 kg/m2 or greater were more likely to feel judged because of their weight, and patients with a BMI of 40.0 kg/m2 or greater were more likely to feel they were not always treated with respect and treated as an equal.

Conclusion: Analyses using different imputation methods were consistent with the original published results. Missing data likely did not affect the study results.

Keywords: BMI, body mass index; MAR, missing at random; MCAR, missing completely at random; MCMC, Markov chain Monte Carlo; MNAR, missing not at random; OR, odds ratio.