Dong Wang, Baoqian Wang, Jinran Zhang, K. Lu, Junfei Xie, Yan Wan, Shengli Fu
{"title":"CFL-HC: A Coded Federated Learning Framework for Heterogeneous Computing Scenarios","authors":"Dong Wang, Baoqian Wang, Jinran Zhang, K. Lu, Junfei Xie, Yan Wan, Shengli Fu","doi":"10.1109/GLOBECOM46510.2021.9685962","DOIUrl":null,"url":null,"abstract":"Federated learning (FL) is a promising machine learning paradigm because it allows distributed edge devices to collaboratively train a model without sharing their raw data. In practice, a major challenge to FL is that edge devices are heterogeneous, so slow devices may compromise the convergence of model training. To address such a challenge, several recent studies have suggested different solutions, in which a promising scheme is to utilize coded computing to facilitate the training of linear models. Nevertheless, the existing coded FL (CFL) scheme is limited by a fixed coding redundancy parameter, and a weight matrix used in the existing design may introduce unnecessary errors. In this paper, we tackle these issues and propose a novel framework, namely CFL-HC, to facilitate CFL in heterogeneous computing scenarios. In our framework, we consider a computing system consisting of a central server and multiple computing devices with original or coded datasets. Then we specify an expected number of input-output pairs that are used in one round. Within such a framework, we formulate an optimization problem to find the best deadline of each training round and the optimal size of the computing task allocated to each computing device. We then design a two-step optimization scheme to obtain the optimal solution. To evaluate the proposed framework, we develop a real CFL system using the message passing interface platform. Based on this system, we conduct numerical experiments, which demonstrate the advantages of the proposed framework, in terms of both accuracy and convergence speed.","PeriodicalId":200641,"journal":{"name":"2021 IEEE Global Communications Conference (GLOBECOM)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Global Communications Conference (GLOBECOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GLOBECOM46510.2021.9685962","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Federated learning (FL) is a promising machine learning paradigm because it allows distributed edge devices to collaboratively train a model without sharing their raw data. In practice, a major challenge to FL is that edge devices are heterogeneous, so slow devices may compromise the convergence of model training. To address such a challenge, several recent studies have suggested different solutions, in which a promising scheme is to utilize coded computing to facilitate the training of linear models. Nevertheless, the existing coded FL (CFL) scheme is limited by a fixed coding redundancy parameter, and a weight matrix used in the existing design may introduce unnecessary errors. In this paper, we tackle these issues and propose a novel framework, namely CFL-HC, to facilitate CFL in heterogeneous computing scenarios. In our framework, we consider a computing system consisting of a central server and multiple computing devices with original or coded datasets. Then we specify an expected number of input-output pairs that are used in one round. Within such a framework, we formulate an optimization problem to find the best deadline of each training round and the optimal size of the computing task allocated to each computing device. We then design a two-step optimization scheme to obtain the optimal solution. To evaluate the proposed framework, we develop a real CFL system using the message passing interface platform. Based on this system, we conduct numerical experiments, which demonstrate the advantages of the proposed framework, in terms of both accuracy and convergence speed.