[Missing Data Replacement Methods in Different Scenarios]

Sichuan Da Xue Xue Bao Yi Xue Ban. 2018 May;49(3):430-435.
[Article in Chinese]

Abstract

Objective: To compare the effect of different approaches of missing data replacement on the regression coefficient estimates r of "length of stay" on "hospital expenditure".

Methods: Data were extracted from the medical records of patients with head and neck neoplasms who were admitted to Sichuan Cancer Hospital. R 3.4.1 was used for generating and processing simulated datasets. Various scenarios were established by setting up different proportions of missing data and missing mechanisms using Monte Carlo method. Three strategies were tested for replacing missing data: Complete Case method,Expectation Maximization (EM),and Markov Chain Monte Carlo method (MCMC). The regression coefficient estimates r of standardized "length of stay" on standardized logarithmic "hospital expenditure" were calculated using these strategies and compared with that of the original complete dataset,in terms of their accuracy (magnitude of differences in r) and precision (differences in the standard error of r).

Results: The three replacement methods were all acceptable (within the limit rc±0.5 sc) when missing data were generated using MAR (2∶1) mechanism,or less than 30% data were simulated as missing using the MCAR and MAR (1∶2) mechanism. The EM method had the best estimation precision.

Conclusion: Missing data replacement should consider the proportion of missing data and potential mechanisms involved.

Keywords: Expectation maximization(EM); Markov chain-Monte Carlo(MCMC); Mechanism of missing; Missing data replacement; Proportion of missing.

MeSH terms

  • Data Accuracy
  • Health Expenditures
  • Humans
  • Length of Stay
  • Markov Chains*
  • Medical Records*
  • Monte Carlo Method*