Clinical research: a novel approach to regression substitution for handling missing data

Am J Ther. 2013 Sep-Oct;20(5):514-9. doi: 10.1097/MJT.0b013e3181ff7a7b.

Abstract

In clinical research, missing data are common. Imputed data are not real data but constructed values that should increase the sensitivity of testing. Regression substitution for the purpose of data imputation often did not provide a better sensitivity than did other methods. The objective of this study was to compare different methods of missing data imputation with that of regression substitution taking into account particular quality measures. A real data example with a 105-value file was used. After randomly removing 5 values from the file, mean imputation and hot deck imputation were compared with regression substitution, taking account of the following requirements: (1) at least 2 independent variables be present in the equation, (2) no more than 1 datum per patient be missing, (3) no more than 5% of the data be missing, (4) more than 5% of the data be missing after randomly choosing 5% for regression-substitution deletion of the remainder, (5) only statistically significant variables be present in the regression model, and (6) no random errors be added to the imputed data. The test statistics after regression substitution were much better than those after the other 2 methods with F-values of 44.1 vs 29.4 and 30.1, and t-values of 7.6 vs 5.6 and 5.7, and 3.0 vs 1.7 and 1.8. We conclude that regression substitution is a very sensitive method for imputing missing data provided particular quality measures are taken into account.

MeSH terms

  • Clinical Trials as Topic / methods*
  • Clinical Trials as Topic / statistics & numerical data*
  • Humans
  • Reproducibility of Results
  • Research Design*