联邦学习有多异步?

2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS) Pub Date : 2022-06-10 DOI:10.1109/IWQoS54832.2022.9812885

Ningxin Su, Baochun Li

{"title":"联邦学习有多异步?","authors":"Ningxin Su, Baochun Li","doi":"10.1109/IWQoS54832.2022.9812885","DOIUrl":null,"url":null,"abstract":"As a practical paradigm designed to involve large numbers of edge devices in distributed training of deep learning models, federated learning has witnessed a significant amount of research attention in the recent years. Yet, most existing mechanisms on federated learning assumed either fully synchronous or asynchronous communication strategies between clients and the federated learning server. Existing designs that were partially asynchronous in their communication were simple heuristics, and were evaluated using the number of communication rounds or updates required for convergence, rather than the wall-clock time in practice.In this paper, we seek to explore the entire design space between fully synchronous and asynchronous mechanisms of communication. Based on insights from our exploration, we propose Port, a new partially asynchronous mechanism designed to allow fast clients to aggregate asynchronously, yet without waiting excessively for the slower ones. In addition, Port is designed to adjust the aggregation weights based on both the staleness and divergence of model updates, with provable convergence guarantees. We have implemented Port and its leading competitors in Plato, an open-source scalable federated learning research framework designed from the ground up to emulate real-world scenarios. With respect to the wall-clock time it takes for converging to the target accuracy, Port outperformed its closest competitor, FedBuff, by up to 40% in our experiments.","PeriodicalId":353365,"journal":{"name":"2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"How Asynchronous can Federated Learning Be?\",\"authors\":\"Ningxin Su, Baochun Li\",\"doi\":\"10.1109/IWQoS54832.2022.9812885\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As a practical paradigm designed to involve large numbers of edge devices in distributed training of deep learning models, federated learning has witnessed a significant amount of research attention in the recent years. Yet, most existing mechanisms on federated learning assumed either fully synchronous or asynchronous communication strategies between clients and the federated learning server. Existing designs that were partially asynchronous in their communication were simple heuristics, and were evaluated using the number of communication rounds or updates required for convergence, rather than the wall-clock time in practice.In this paper, we seek to explore the entire design space between fully synchronous and asynchronous mechanisms of communication. Based on insights from our exploration, we propose Port, a new partially asynchronous mechanism designed to allow fast clients to aggregate asynchronously, yet without waiting excessively for the slower ones. In addition, Port is designed to adjust the aggregation weights based on both the staleness and divergence of model updates, with provable convergence guarantees. We have implemented Port and its leading competitors in Plato, an open-source scalable federated learning research framework designed from the ground up to emulate real-world scenarios. With respect to the wall-clock time it takes for converging to the target accuracy, Port outperformed its closest competitor, FedBuff, by up to 40% in our experiments.\",\"PeriodicalId\":353365,\"journal\":{\"name\":\"2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IWQoS54832.2022.9812885\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWQoS54832.2022.9812885","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

联邦学习作为一种涉及大量边缘设备进行深度学习模型分布式训练的实用范式，近年来得到了大量的研究关注。然而，大多数现有的联邦学习机制都假定客户机和联邦学习服务器之间的通信策略是完全同步的或异步的。在通信中部分异步的现有设计是简单的启发式方法，并且使用通信轮数或收敛所需的更新进行评估，而不是实际中的时钟时间。在本文中，我们试图探索完全同步和异步通信机制之间的整个设计空间。基于我们探索的见解，我们提出了Port，这是一种新的部分异步机制，旨在允许快速客户端异步聚合，而不会过多地等待较慢的客户端。此外，Port根据模型更新的过时性和发散性来调整聚合权值，具有可证明的收敛性保证。我们已经在Plato中实现了Port和它的主要竞争对手，Plato是一个开源的可扩展的联邦学习研究框架，旨在从头开始模拟现实世界的场景。关于收敛到目标精度所需的时钟时间，Port在我们的实验中比最接近的竞争对手FedBuff高出40%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

How Asynchronous can Federated Learning Be?

As a practical paradigm designed to involve large numbers of edge devices in distributed training of deep learning models, federated learning has witnessed a significant amount of research attention in the recent years. Yet, most existing mechanisms on federated learning assumed either fully synchronous or asynchronous communication strategies between clients and the federated learning server. Existing designs that were partially asynchronous in their communication were simple heuristics, and were evaluated using the number of communication rounds or updates required for convergence, rather than the wall-clock time in practice.In this paper, we seek to explore the entire design space between fully synchronous and asynchronous mechanisms of communication. Based on insights from our exploration, we propose Port, a new partially asynchronous mechanism designed to allow fast clients to aggregate asynchronously, yet without waiting excessively for the slower ones. In addition, Port is designed to adjust the aggregation weights based on both the staleness and divergence of model updates, with provable convergence guarantees. We have implemented Port and its leading competitors in Plato, an open-source scalable federated learning research framework designed from the ground up to emulate real-world scenarios. With respect to the wall-clock time it takes for converging to the target accuracy, Port outperformed its closest competitor, FedBuff, by up to 40% in our experiments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS)

自引率

0.00%

发文量