扩展:用于可扩展联邦学习的分散模型聚合

Chuang Hu, Huang Huang Liang, Xiao Han, Bo Liu, D. Cheng, Dan Wang
{"title":"扩展:用于可扩展联邦学习的分散模型聚合","authors":"Chuang Hu, Huang Huang Liang, Xiao Han, Bo Liu, D. Cheng, Dan Wang","doi":"10.1145/3545008.3545030","DOIUrl":null,"url":null,"abstract":"Federated learning (FL) is a new distributed machine learning paradigm that enables machine learning on edge devices. One unique feature of FL is that edge devices belong to individuals; and since they are not “owned” by the FL coordinator, but can be “federated” instead, there can potentially be a huge number of edge devices. In the current distributed ML architecture, the parameter server (PS) architecture, model aggregation is centralized. When facing a large number of edge devices, the centralized model aggregation becomes the bottleneck and fundamentally restricts system scalability. In this paper, we present Spread to decentralize model aggregation. Spread is a tiered architecture where nodes are organized into clusters so that model aggregation can be offloaded to certain edge devices. We design a Spread-based FL system: it employs a new algorithm for cluster construction and an adaptive algorithm that regulates, in runtime, inter-cluster model training and intra-cluster model training. We present an implementation of a functional system by extending the Federated Learning system. Our evaluation shows that Spread can resolve the bottleneck of centralized model aggregation. Spread yields an 8.05 × and a 5.58 × model training speedup as compared to existing FL systems supported by the PS and allReduce architecture.","PeriodicalId":360504,"journal":{"name":"Proceedings of the 51st International Conference on Parallel Processing","volume":"159 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Spread: Decentralized Model Aggregation for Scalable Federated Learning\",\"authors\":\"Chuang Hu, Huang Huang Liang, Xiao Han, Bo Liu, D. Cheng, Dan Wang\",\"doi\":\"10.1145/3545008.3545030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Federated learning (FL) is a new distributed machine learning paradigm that enables machine learning on edge devices. One unique feature of FL is that edge devices belong to individuals; and since they are not “owned” by the FL coordinator, but can be “federated” instead, there can potentially be a huge number of edge devices. In the current distributed ML architecture, the parameter server (PS) architecture, model aggregation is centralized. When facing a large number of edge devices, the centralized model aggregation becomes the bottleneck and fundamentally restricts system scalability. In this paper, we present Spread to decentralize model aggregation. Spread is a tiered architecture where nodes are organized into clusters so that model aggregation can be offloaded to certain edge devices. We design a Spread-based FL system: it employs a new algorithm for cluster construction and an adaptive algorithm that regulates, in runtime, inter-cluster model training and intra-cluster model training. We present an implementation of a functional system by extending the Federated Learning system. Our evaluation shows that Spread can resolve the bottleneck of centralized model aggregation. Spread yields an 8.05 × and a 5.58 × model training speedup as compared to existing FL systems supported by the PS and allReduce architecture.\",\"PeriodicalId\":360504,\"journal\":{\"name\":\"Proceedings of the 51st International Conference on Parallel Processing\",\"volume\":\"159 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 51st International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3545008.3545030\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3545008.3545030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

联邦学习(FL)是一种新的分布式机器学习范式,可以在边缘设备上实现机器学习。FL的一个独特之处在于边缘设备属于个人;由于它们不是由FL协调器“拥有”,而是可以“联合”,因此可能存在大量边缘设备。在目前的分布式机器学习体系结构中,参数服务器(PS)体系结构、模型聚合是集中式的。当面对大量的边缘设备时,集中式的模型聚合成为瓶颈,从根本上制约了系统的可扩展性。在本文中,我们提出了分散模型聚合的扩展方法。Spread是一种分层架构,其中节点被组织到集群中,以便模型聚合可以卸载到某些边缘设备上。我们设计了一个基于spread的FL系统:它采用了一种新的聚类构建算法和一种自适应算法,在运行时调节聚类间模型训练和聚类内模型训练。通过对联邦学习系统的扩展,提出了一个功能系统的实现。我们的评估表明,Spread可以解决集中式模型聚合的瓶颈。与PS和allReduce架构支持的现有FL系统相比,Spread产生了8.05倍和5.58倍的模型训练加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Spread: Decentralized Model Aggregation for Scalable Federated Learning
Federated learning (FL) is a new distributed machine learning paradigm that enables machine learning on edge devices. One unique feature of FL is that edge devices belong to individuals; and since they are not “owned” by the FL coordinator, but can be “federated” instead, there can potentially be a huge number of edge devices. In the current distributed ML architecture, the parameter server (PS) architecture, model aggregation is centralized. When facing a large number of edge devices, the centralized model aggregation becomes the bottleneck and fundamentally restricts system scalability. In this paper, we present Spread to decentralize model aggregation. Spread is a tiered architecture where nodes are organized into clusters so that model aggregation can be offloaded to certain edge devices. We design a Spread-based FL system: it employs a new algorithm for cluster construction and an adaptive algorithm that regulates, in runtime, inter-cluster model training and intra-cluster model training. We present an implementation of a functional system by extending the Federated Learning system. Our evaluation shows that Spread can resolve the bottleneck of centralized model aggregation. Spread yields an 8.05 × and a 5.58 × model training speedup as compared to existing FL systems supported by the PS and allReduce architecture.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信