扩展:用于可扩展联邦学习的分散模型聚合

Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-08-29 DOI:10.1145/3545008.3545030

Chuang Hu, Huang Huang Liang, Xiao Han, Bo Liu, D. Cheng, Dan Wang

{"title":"扩展:用于可扩展联邦学习的分散模型聚合","authors":"Chuang Hu, Huang Huang Liang, Xiao Han, Bo Liu, D. Cheng, Dan Wang","doi":"10.1145/3545008.3545030","DOIUrl":null,"url":null,"abstract":"Federated learning (FL) is a new distributed machine learning paradigm that enables machine learning on edge devices. One unique feature of FL is that edge devices belong to individuals; and since they are not “owned” by the FL coordinator, but can be “federated” instead, there can potentially be a huge number of edge devices. In the current distributed ML architecture, the parameter server (PS) architecture, model aggregation is centralized. When facing a large number of edge devices, the centralized model aggregation becomes the bottleneck and fundamentally restricts system scalability. In this paper, we present Spread to decentralize model aggregation. Spread is a tiered architecture where nodes are organized into clusters so that model aggregation can be offloaded to certain edge devices. We design a Spread-based FL system: it employs a new algorithm for cluster construction and an adaptive algorithm that regulates, in runtime, inter-cluster model training and intra-cluster model training. We present an implementation of a functional system by extending the Federated Learning system. Our evaluation shows that Spread can resolve the bottleneck of centralized model aggregation. Spread yields an 8.05 × and a 5.58 × model training speedup as compared to existing FL systems supported by the PS and allReduce architecture.","PeriodicalId":360504,"journal":{"name":"Proceedings of the 51st International Conference on Parallel Processing","volume":"159 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Spread: Decentralized Model Aggregation for Scalable Federated Learning\",\"authors\":\"Chuang Hu, Huang Huang Liang, Xiao Han, Bo Liu, D. Cheng, Dan Wang\",\"doi\":\"10.1145/3545008.3545030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Federated learning (FL) is a new distributed machine learning paradigm that enables machine learning on edge devices. One unique feature of FL is that edge devices belong to individuals; and since they are not “owned” by the FL coordinator, but can be “federated” instead, there can potentially be a huge number of edge devices. In the current distributed ML architecture, the parameter server (PS) architecture, model aggregation is centralized. When facing a large number of edge devices, the centralized model aggregation becomes the bottleneck and fundamentally restricts system scalability. In this paper, we present Spread to decentralize model aggregation. Spread is a tiered architecture where nodes are organized into clusters so that model aggregation can be offloaded to certain edge devices. We design a Spread-based FL system: it employs a new algorithm for cluster construction and an adaptive algorithm that regulates, in runtime, inter-cluster model training and intra-cluster model training. We present an implementation of a functional system by extending the Federated Learning system. Our evaluation shows that Spread can resolve the bottleneck of centralized model aggregation. Spread yields an 8.05 × and a 5.58 × model training speedup as compared to existing FL systems supported by the PS and allReduce architecture.\",\"PeriodicalId\":360504,\"journal\":{\"name\":\"Proceedings of the 51st International Conference on Parallel Processing\",\"volume\":\"159 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 51st International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3545008.3545030\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3545008.3545030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

联邦学习(FL)是一种新的分布式机器学习范式，可以在边缘设备上实现机器学习。FL的一个独特之处在于边缘设备属于个人;由于它们不是由FL协调器“拥有”，而是可以“联合”，因此可能存在大量边缘设备。在目前的分布式机器学习体系结构中，参数服务器(PS)体系结构、模型聚合是集中式的。当面对大量的边缘设备时，集中式的模型聚合成为瓶颈，从根本上制约了系统的可扩展性。在本文中，我们提出了分散模型聚合的扩展方法。Spread是一种分层架构，其中节点被组织到集群中，以便模型聚合可以卸载到某些边缘设备上。我们设计了一个基于spread的FL系统:它采用了一种新的聚类构建算法和一种自适应算法，在运行时调节聚类间模型训练和聚类内模型训练。通过对联邦学习系统的扩展，提出了一个功能系统的实现。我们的评估表明，Spread可以解决集中式模型聚合的瓶颈。与PS和allReduce架构支持的现有FL系统相比，Spread产生了8.05倍和5.58倍的模型训练加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Spread: Decentralized Model Aggregation for Scalable Federated Learning

Federated learning (FL) is a new distributed machine learning paradigm that enables machine learning on edge devices. One unique feature of FL is that edge devices belong to individuals; and since they are not “owned” by the FL coordinator, but can be “federated” instead, there can potentially be a huge number of edge devices. In the current distributed ML architecture, the parameter server (PS) architecture, model aggregation is centralized. When facing a large number of edge devices, the centralized model aggregation becomes the bottleneck and fundamentally restricts system scalability. In this paper, we present Spread to decentralize model aggregation. Spread is a tiered architecture where nodes are organized into clusters so that model aggregation can be offloaded to certain edge devices. We design a Spread-based FL system: it employs a new algorithm for cluster construction and an adaptive algorithm that regulates, in runtime, inter-cluster model training and intra-cluster model training. We present an implementation of a functional system by extending the Federated Learning system. Our evaluation shows that Spread can resolve the bottleneck of centralized model aggregation. Spread yields an 8.05 × and a 5.58 × model training speedup as compared to existing FL systems supported by the PS and allReduce architecture.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 51st International Conference on Parallel Processing

自引率

0.00%

发文量