A Two-Phase Feature Selection Method for Identifying Influential Spreaders of Disease Epidemics in Complex Networks

Entropy (Basel). 2023 Jul 15;25(7):1068. doi: 10.3390/e25071068.

Abstract

Network epidemiology plays a fundamental role in understanding the relationship between network structure and epidemic dynamics, among which identifying influential spreaders is especially important. Most previous studies aim to propose a centrality measure based on network topology to reflect the influence of spreaders, which manifest limited universality. Machine learning enhances the identification of influential spreaders by combining multiple centralities. However, several centrality measures utilized in machine learning methods, such as closeness centrality, exhibit high computational complexity when confronted with large network sizes. Here, we propose a two-phase feature selection method for identifying influential spreaders with a reduced feature dimension. Depending on the definition of influential spreaders, we obtain the optimal feature combination for different synthetic networks. Our results demonstrate that when the datasets are mildly or moderately imbalanced, for Barabasi-Albert (BA) scale-free networks, the centralities' combination with the two-hop neighborhood is fundamental, and for Erdős-Rényi (ER) random graphs, the centralities' combination with the degree centrality is essential. Meanwhile, for Watts-Strogatz (WS) small world networks, feature selection is unnecessary. We also conduct experiments on real-world networks, and the features selected display a high similarity with synthetic networks. Our method provides a new path for identifying superspreaders for the control of epidemics.

Keywords: feature selection; influential node identification; machine learning; network epidemiology.