An improved two-stage label propagation algorithm based on LeaderRank

PeerJ Comput Sci. 2022 May 18:8:e981. doi: 10.7717/peerj-cs.981. eCollection 2022.

Abstract

To solve the problems of poor stability and low modularity (Q) of community division results caused by the randomness of node selection and label update in the traditional label propagation algorithm, an improved two-stage label propagation algorithm based on LeaderRank was proposed in this study. In the first stage, the order of node updating was determined by the participation coefficient (PC). Then, a new similarity measure was defined to improve the label selection mechanism so as to solve the problem of label oscillation caused by multiple labels of the node with the most similarity to the node. Moreover, the influence of the nodes was comprehensively used to find the initial community structure. In the second stage, the rough communities obtained in the first stage were regarded as nodes, and their merging sequence was determined by the PC. Next, the non-weak community and the community with the largest number of connected edges were combined. Finally, the community structure was further optimized to improve the modularity so as to obtain the final partition result. Experiments were performed on nine classic realistic networks and 19 artificial datasets with different scales, complexities, and densities. The modularity and normalized mutual information (NMI) were used as evaluation indexes for comparing the improved algorithm with dozens of relevant classic algorithms. The results showed that the proposed algorithm yields superior performance, and the results of community partitioning obtained using the improved algorithm were stable and more accurate than those obtained using other algorithms. In addition, the proposed algorithm always performs well in nine large-scale artificial data sets with 6,000 to 50,000 nodes and three large realistic network datasets, which verifies its computational performance and utility in community detection for large-scale networks.

Keywords: Community division; Label propagation; LeaderRank; Modularity; Node influence; Weak community.

Grants and funding

This work was supported by the National Natural Science Foundation of China (42002138, 62172352), the Natural Science Foundation of Heilongjiang Province (LH2019F042), the postdoctoral scientific research development fund of Heilongjiang Province (No. LBH-Q20073) and the Excellent Young and Middle-aged Innovative Team Cultivation Foundation of Northeast Petroleum University (KYCXTDQ202101). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.