Estimating Incremental Validity Under Missing Data

Multivariate Behav Res. 2017 Mar-Apr;52(2):164-177. doi: 10.1080/00273171.2016.1259099. Epub 2016 Dec 20.

Abstract

A common form of missing data is caused by selection on an observed variable (e.g., Z). If the selection variable was measured and is available, the data are regarded as missing at random (MAR). Selection biases correlation, reliability, and effect size estimates when these estimates are computed on listwise deleted (LD) data sets. On the other hand, maximum likelihood (ML) estimates are generally unbiased and outperform LD in most situations, at least when the data are MAR. The exception is when we estimate the partial correlation. In this situation, LD estimates are unbiased when the cause of missingness is partialled out. In other words, there is no advantage of ML estimates over LD estimates in this situation. We demonstrate that under a MAR condition, even ML estimates may become biased, depending on how partial correlations are computed. Finally, we conclude with recommendations about how future researchers might estimate partial correlations even when the cause of missingness is unknown and, perhaps, unknowable.

Keywords: Missing data; incremental validity; listwise deletion; maximum likelihood.

MeSH terms

  • Algorithms
  • Computer Simulation
  • Data Interpretation, Statistical*
  • Educational Status
  • Humans
  • Likelihood Functions*
  • Monte Carlo Method
  • Multivariate Analysis*
  • Reproducibility of Results
  • Socioeconomic Factors
  • Students
  • Universities