Predicting and Correcting Missing Data on Diffusion Processes in Multiplex Networks

Alireza Khosravani; Mostafa Salehi; Vahid Ranjbar; Rajesh Sharma; Shaghayegh Najari

Predicting and Correcting Missing Data on Diffusion Processes in Multiplex Networks

Nonlinear Dynamics Psychol Life Sci. 2021 Apr;25(2):127-155.

Authors

Alireza Khosravani¹, Mostafa Salehi¹, Vahid Ranjbar², Rajesh Sharma³, Shaghayegh Najari¹

Affiliations

¹ University of Tehran, Tehran, Iran.
² Yazd University, Yazd, Iran.
³ University of Tartu, Tartu, Estonia.

PMID: 33838696

Abstract

The diffusion process in networks is studied with the objective of identifying the dynamics and for predicting the behavior of network entities. Social media plays an important role in people's lives. Diffusion processes, as one of the most important branches of social media analysis, have their presence in various domains such as information spreading, diffusion of innovation, idea dissemination, and product acceptance to identify user's pattern and their behavior in social media networks. Users are not limited to one social network and are engaged in multiple social media such as Twitter, Instagram, Telegram, and Facebook. This fact has created new phenomena in social network analysis, called multiplex network analysis. Thus, the scope of diffusion process analysis has been transferred from single layer networks to multiplex networks. Diffusion process analysis can be studied at both infrastructure-level and diffusion-level; at infrastructure-level, the structural network's properties such as clustering coefficient and degree centrality are being studied; and in diffusion-level the diffusion network's properties such as diffusion depth and seed nodes are being studied. On the other hand, a reliable analysis requires complete information on both infrastructure and diffusion networks. However, complete data is not accessible forever, this fact is due to some limitations such as crawling big data, gathering social media policies, and user privacy. Incomplete data can lead to poor analysis, so in this work we, first of all, investigate the impact of missing data in both infrastructure and diffusion networks, the impact of random and non-random missing infrastructure data on nine diffusion network's properties such as number of infected nodes, number of infected edges, diffusion length and number of seed nodes. Secondly, based on the multiplex diffusion tree, we introduce a new model named as MLC-tree for an incomplete diffusion network. Finally, we evaluate our model on both synthetic and real social networks; these results show that the MLC-tree can decrease the relative error more than 50 percent while missing 20 to 80 percent of complete data.