Federated learning with workload-aware client scheduling in heterogeneous systems

Li Li; Duo Liu; Moming Duan; Yu Zhang; Ao Ren; Xianzhang Chen; Yujuan Tan; Chengliang Wang

doi:10.1016/j.neunet.2022.07.030

Federated learning with workload-aware client scheduling in heterogeneous systems

Neural Netw. 2022 Oct:154:560-573. doi: 10.1016/j.neunet.2022.07.030. Epub 2022 Aug 1.

Authors

Li Li¹, Duo Liu², Moming Duan³, Yu Zhang⁴, Ao Ren⁵, Xianzhang Chen⁶, Yujuan Tan⁷, Chengliang Wang⁸

Affiliations

¹ College of Computer Science, Chongqing University, Chongqing, China. Electronic address: li.li@cqu.edu.cn.
² College of Computer Science, Chongqing University, Chongqing, China. Electronic address: liuduo@cqu.edu.cn.
³ College of Computer Science, Chongqing University, Chongqing, China. Electronic address: duanmoming@cqu.edu.cn.
⁴ College of Computer Science, Chongqing University, Chongqing, China. Electronic address: zhangyucqu9@gmail.com.
⁵ College of Computer Science, Chongqing University, Chongqing, China. Electronic address: ren.ao@cqu.edu.cn.
⁶ College of Computer Science, Chongqing University, Chongqing, China. Electronic address: xzchen@cqu.edu.cn.
⁷ College of Computer Science, Chongqing University, Chongqing, China. Electronic address: tanyujuan@cqu.edu.cn.
⁸ College of Computer Science, Chongqing University, Chongqing, China. Electronic address: wangcl@cqu.edu.cn.

PMID: 35995021
DOI: 10.1016/j.neunet.2022.07.030

Abstract

Federated Learning (FL) is a novel distributed machine learning, which allows thousands of edge devices to train models locally without uploading data to the central server. Since devices in real federated settings are resource-constrained, FL encounters systems heterogeneity, which causes considerable stragglers and incurs significant accuracy degradation. To tackle the challenges of systems heterogeneity and improve the robustness of the global model, we propose a novel adaptive federated framework in this paper. Specifically, we propose FedSAE that leverages the workload completion history of clients to adaptively predict the affordable training workload for each device. Consequently, FedSAE can significantly reduce stragglers in highly heterogeneous systems. We incorporate Active Learning into FedSAE to dynamically schedule participants. The server evaluates the devices' training value based on their training loss in each round, and larger-value clients are selected with a higher probability. As a result, the model convergence is accelerated. Furthermore, we propose q-FedSAE that combines FedSAE and q-FFL to improve global fairness in highly heterogeneous systems. The evaluations conducted in a highly heterogeneous system demonstrate that both FedSAE and q-FedSAE converge faster than FedAvg. In particular, FedSAE outperforms FedAvg across multiple federated datasets - FedSAE improves testing accuracy by 22.19% and reduces stragglers by 90.69% on average. Moreover, holding the same accuracy as FedSAE, q-FedSAE allows for more robust convergence and fairer model performance than q-FedAvg, FedSAE.

Keywords: Distributed machine learning; Federated learning; Heterogeneous systems; Neural Networks.

MeSH terms

Humans
Machine Learning*
Workload*