FedRAD: Heterogeneous Federated Learning via Relational Adaptive Distillation

Jianwu Tang; Xuefeng Ding; Dasha Hu; Bing Guo; Yuncheng Shen; Pan Ma; Yuming Jiang

doi:10.3390/s23146518

FedRAD: Heterogeneous Federated Learning via Relational Adaptive Distillation

Sensors (Basel). 2023 Jul 19;23(14):6518. doi: 10.3390/s23146518.

Authors

Jianwu Tang^{1

2}, Xuefeng Ding^{1

2}, Dasha Hu^{1

2}, Bing Guo^{1

2}, Yuncheng Shen³, Pan Ma^{1

2}, Yuming Jiang^{1

2}

Affiliations

¹ College of Computer Science, Sichuan University, Chengdu 610065, China.
² Big Data Analysis and Fusion Application Technology Engineering Laboratory of Sichuan Province, Chengdu 610065, China.
³ College of Physics and Information Engineering, Zhaotong University, Zhaotong 657000, China.

Abstract

As the development of the Internet of Things (IoT) continues, Federated Learning (FL) is gaining popularity as a distributed machine learning framework that does not compromise the data privacy of each participant. However, the data held by enterprises and factories in the IoT often have different distribution properties (Non-IID), leading to poor results in their federated learning. This problem causes clients to forget about global knowledge during their local training phase and then tends to slow convergence and degrades accuracy. In this work, we propose a method named FedRAD, which is based on relational knowledge distillation that further enhances the mining of high-quality global knowledge by local models from a higher-dimensional perspective during their local training phase to better retain global knowledge and avoid forgetting. At the same time, we devise an entropy-wise adaptive weights module (EWAW) to better regulate the proportion of loss in single-sample knowledge distillation versus relational knowledge distillation so that students can weigh losses based on predicted entropy and learn global knowledge more effectively. A series of experiments on CIFAR10 and CIFAR100 show that FedRAD has better performance in terms of convergence speed and classification accuracy compared to other advanced FL methods.

Keywords: catastrophic forgetting; data heterogeneity; federated learning; knowledge distillation; self-adaption.

Abstract

Grants and funding