Fault-tolerant Hamiltonian cycle strategy for fast node fault diagnosis based on PMC in data center networks

Math Biosci Eng. 2024 Jan 10;21(2):2121-2136. doi: 10.3934/mbe.2024093.

Abstract

System-level fault diagnosis model, namely, the PMC model, detects fault nodes only through the mutual testing of nodes in the system without physical equipment. In order to achieve server nodes fault diagnosis in large-scale data center networks (DCNs), the traditional algorithm based on the PMC model cannot meet the characteristics of high diagnosability, high accuracy and high efficiency due to its inability to ensure that the test nodes are fault-free. This paper first proposed a fault-tolerant Hamiltonian cycle fault diagnosis (FHFD) algorithm, which tests nodes in the order of the Hamiltonian cycle to ensure that the test nodes are faultless. In order to improve testing efficiency, a hierarchical diagnosis mechanism was further proposed, which recursively divides high scale structures into a large number of low scale structures based on the recursive structure characteristics of DCNs. Additionally, we proved that $ 2(n-2){n^{k-1}} $ and $ (n-2){t_{n, k}}/{t_{n, 1}} $ faulty nodes could be detected for $ BCub{e_{n, k}} $ and $ DCel{l_{n, k}} $ within a limited time for the proposed diagnosis strategy. Simulation experiments have also shown that our proposed strategy has improved the diagnosability and test efficiency dramatically.

Keywords: Data center; Fault-tolerant Hamiltonian cycle; server node fault diagnosis.