Spread: Decentralized Model Aggregation for Scalable Federated Learning

Chuang Hu, Huang Huang Liang, Xiao Han, Bo Liu, D. Cheng, Dan Wang
{"title":"Spread: Decentralized Model Aggregation for Scalable Federated Learning","authors":"Chuang Hu, Huang Huang Liang, Xiao Han, Bo Liu, D. Cheng, Dan Wang","doi":"10.1145/3545008.3545030","DOIUrl":null,"url":null,"abstract":"Federated learning (FL) is a new distributed machine learning paradigm that enables machine learning on edge devices. One unique feature of FL is that edge devices belong to individuals; and since they are not “owned” by the FL coordinator, but can be “federated” instead, there can potentially be a huge number of edge devices. In the current distributed ML architecture, the parameter server (PS) architecture, model aggregation is centralized. When facing a large number of edge devices, the centralized model aggregation becomes the bottleneck and fundamentally restricts system scalability. In this paper, we present Spread to decentralize model aggregation. Spread is a tiered architecture where nodes are organized into clusters so that model aggregation can be offloaded to certain edge devices. We design a Spread-based FL system: it employs a new algorithm for cluster construction and an adaptive algorithm that regulates, in runtime, inter-cluster model training and intra-cluster model training. We present an implementation of a functional system by extending the Federated Learning system. Our evaluation shows that Spread can resolve the bottleneck of centralized model aggregation. Spread yields an 8.05 × and a 5.58 × model training speedup as compared to existing FL systems supported by the PS and allReduce architecture.","PeriodicalId":360504,"journal":{"name":"Proceedings of the 51st International Conference on Parallel Processing","volume":"159 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3545008.3545030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Federated learning (FL) is a new distributed machine learning paradigm that enables machine learning on edge devices. One unique feature of FL is that edge devices belong to individuals; and since they are not “owned” by the FL coordinator, but can be “federated” instead, there can potentially be a huge number of edge devices. In the current distributed ML architecture, the parameter server (PS) architecture, model aggregation is centralized. When facing a large number of edge devices, the centralized model aggregation becomes the bottleneck and fundamentally restricts system scalability. In this paper, we present Spread to decentralize model aggregation. Spread is a tiered architecture where nodes are organized into clusters so that model aggregation can be offloaded to certain edge devices. We design a Spread-based FL system: it employs a new algorithm for cluster construction and an adaptive algorithm that regulates, in runtime, inter-cluster model training and intra-cluster model training. We present an implementation of a functional system by extending the Federated Learning system. Our evaluation shows that Spread can resolve the bottleneck of centralized model aggregation. Spread yields an 8.05 × and a 5.58 × model training speedup as compared to existing FL systems supported by the PS and allReduce architecture.
扩展:用于可扩展联邦学习的分散模型聚合
联邦学习(FL)是一种新的分布式机器学习范式,可以在边缘设备上实现机器学习。FL的一个独特之处在于边缘设备属于个人;由于它们不是由FL协调器“拥有”,而是可以“联合”,因此可能存在大量边缘设备。在目前的分布式机器学习体系结构中,参数服务器(PS)体系结构、模型聚合是集中式的。当面对大量的边缘设备时,集中式的模型聚合成为瓶颈,从根本上制约了系统的可扩展性。在本文中,我们提出了分散模型聚合的扩展方法。Spread是一种分层架构,其中节点被组织到集群中,以便模型聚合可以卸载到某些边缘设备上。我们设计了一个基于spread的FL系统:它采用了一种新的聚类构建算法和一种自适应算法,在运行时调节聚类间模型训练和聚类内模型训练。通过对联邦学习系统的扩展,提出了一个功能系统的实现。我们的评估表明,Spread可以解决集中式模型聚合的瓶颈。与PS和allReduce架构支持的现有FL系统相比,Spread产生了8.05倍和5.58倍的模型训练加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信