Simulation-Based Performance Evaluation of Missing Data Handling in Network Analysis

Multivariate Behav Res. 2024 Jan 21:1-21. doi: 10.1080/00273171.2023.2283638. Online ahead of print.

Abstract

Network analysis has gained popularity as an approach to investigate psychological constructs. However, there are currently no guidelines for applied researchers when encountering missing values. In this simulation study, we compared the performance of a two-step EM algorithm with separated steps for missing handling and regularization, a combined direct EM algorithm, and pairwise deletion. We investigated conditions with varying network sizes, numbers of observations, missing data mechanisms, and percentages of missing values. These approaches are evaluated with regard to recovering population networks in terms of loss in the precision matrix, edge set identification and network statistics. The simulation showed adequate performance only in conditions with large samples (n500) or small networks (p = 10). Comparing the missing data approaches, the direct EM appears to be more sensitive and superior in nearly all chosen conditions. The two-step EM yields better results when the ratio of n/p is very large - being less sensitive but more specific. Pairwise deletion failed to converge across numerous conditions and yielded inferior results overall. Overall, direct EM is recommended in most cases, as it is able to mitigate the impact of missing data quite well, while modifications to two-step EM could improve its performance.

Keywords: EM algorithms; Network analysis; graphical lasso regularization; missing values; simulation study.