Spread: Decentralized Model Aggregation for Scalable Federated Learning

Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-08-29 DOI:10.1145/3545008.3545030

Chuang Hu, Huang Huang Liang, Xiao Han, Bo Liu, D. Cheng, Dan Wang

{"title":"Spread: Decentralized Model Aggregation for Scalable Federated Learning","authors":"Chuang Hu, Huang Huang Liang, Xiao Han, Bo Liu, D. Cheng, Dan Wang","doi":"10.1145/3545008.3545030","DOIUrl":null,"url":null,"abstract":"Federated learning (FL) is a new distributed machine learning paradigm that enables machine learning on edge devices. One unique feature of FL is that edge devices belong to individuals; and since they are not “owned” by the FL coordinator, but can be “federated” instead, there can potentially be a huge number of edge devices. In the current distributed ML architecture, the parameter server (PS) architecture, model aggregation is centralized. When facing a large number of edge devices, the centralized model aggregation becomes the bottleneck and fundamentally restricts system scalability. In this paper, we present Spread to decentralize model aggregation. Spread is a tiered architecture where nodes are organized into clusters so that model aggregation can be offloaded to certain edge devices. We design a Spread-based FL system: it employs a new algorithm for cluster construction and an adaptive algorithm that regulates, in runtime, inter-cluster model training and intra-cluster model training. We present an implementation of a functional system by extending the Federated Learning system. Our evaluation shows that Spread can resolve the bottleneck of centralized model aggregation. Spread yields an 8.05 × and a 5.58 × model training speedup as compared to existing FL systems supported by the PS and allReduce architecture.","PeriodicalId":360504,"journal":{"name":"Proceedings of the 51st International Conference on Parallel Processing","volume":"159 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3545008.3545030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Federated learning (FL) is a new distributed machine learning paradigm that enables machine learning on edge devices. One unique feature of FL is that edge devices belong to individuals; and since they are not “owned” by the FL coordinator, but can be “federated” instead, there can potentially be a huge number of edge devices. In the current distributed ML architecture, the parameter server (PS) architecture, model aggregation is centralized. When facing a large number of edge devices, the centralized model aggregation becomes the bottleneck and fundamentally restricts system scalability. In this paper, we present Spread to decentralize model aggregation. Spread is a tiered architecture where nodes are organized into clusters so that model aggregation can be offloaded to certain edge devices. We design a Spread-based FL system: it employs a new algorithm for cluster construction and an adaptive algorithm that regulates, in runtime, inter-cluster model training and intra-cluster model training. We present an implementation of a functional system by extending the Federated Learning system. Our evaluation shows that Spread can resolve the bottleneck of centralized model aggregation. Spread yields an 8.05 × and a 5.58 × model training speedup as compared to existing FL systems supported by the PS and allReduce architecture.

查看原文本刊更多论文

扩展:用于可扩展联邦学习的分散模型聚合

联邦学习(FL)是一种新的分布式机器学习范式，可以在边缘设备上实现机器学习。FL的一个独特之处在于边缘设备属于个人;由于它们不是由FL协调器“拥有”，而是可以“联合”，因此可能存在大量边缘设备。在目前的分布式机器学习体系结构中，参数服务器(PS)体系结构、模型聚合是集中式的。当面对大量的边缘设备时，集中式的模型聚合成为瓶颈，从根本上制约了系统的可扩展性。在本文中，我们提出了分散模型聚合的扩展方法。Spread是一种分层架构，其中节点被组织到集群中，以便模型聚合可以卸载到某些边缘设备上。我们设计了一个基于spread的FL系统:它采用了一种新的聚类构建算法和一种自适应算法，在运行时调节聚类间模型训练和聚类内模型训练。通过对联邦学习系统的扩展，提出了一个功能系统的实现。我们的评估表明，Spread可以解决集中式模型聚合的瓶颈。与PS和allReduce架构支持的现有FL系统相比，Spread产生了8.05倍和5.58倍的模型训练加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 51st International Conference on Parallel Processing

自引率

0.00%

发文量