KAFL：实现快速异步联合学习的高培训效率

2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS) Pub Date : 2022-07-01 DOI:10.1109/ICDCS54860.2022.00089

Xueyu Wu, Cho-Li Wang

{"title":"KAFL：实现快速异步联合学习的高培训效率","authors":"Xueyu Wu, Cho-Li Wang","doi":"10.1109/ICDCS54860.2022.00089","DOIUrl":null,"url":null,"abstract":"Federated Averaging (FedAvg) and its variants are prevalent optimization algorithms adopted in Federated Learning (FL) as they show good model convergence. However, such optimization methods are mostly running in a synchronous flavor which is plagued by the straggler problem, especially in the real-world FL scenario. Federated learning involves a massive number of resource-weak edge devices connected to the intermittent networks, exhibiting a vastly heterogeneous training environment. The asynchronous setting is a plausible solution to fulfill the resources utilization. Yet, due to data and device heterogeneity, the training bias and model staleness dramatically downgrade the model performance. This paper presents KAFL, a fast-K Asynchronous Federated Learning framework, to improve the system and statistical efficiency. KAFL allows the global server to iteratively collect and aggregate (1) the parameters uploaded by the fastest K edge clients (K-FedAsync); or (2) the first M updated parameters sent from any clients (Mstep-FedAsync). Compared to the fully asynchronous setting, KAFL helps the server obtain a better direction toward the global optima as it collects the information from at least K clients or M parameters. To further improve the convergence speed of KAFL, we propose a new weighted aggregation method which dynamically adjusts the aggregation weights according to the weight deviation matrix and client contribution frequency. Experimental results show that KAFL achieves a significant time-to-target-accuracy speedup on both IID and Non-IID datasets. To achieve the same model accuracy, KAFL reduces more than 50% training time for five CNN and RNN models, demonstrating the high training efficiency of our proposed framework.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"KAFL: Achieving High Training Efficiency for Fast-K Asynchronous Federated Learning\",\"authors\":\"Xueyu Wu, Cho-Li Wang\",\"doi\":\"10.1109/ICDCS54860.2022.00089\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Federated Averaging (FedAvg) and its variants are prevalent optimization algorithms adopted in Federated Learning (FL) as they show good model convergence. However, such optimization methods are mostly running in a synchronous flavor which is plagued by the straggler problem, especially in the real-world FL scenario. Federated learning involves a massive number of resource-weak edge devices connected to the intermittent networks, exhibiting a vastly heterogeneous training environment. The asynchronous setting is a plausible solution to fulfill the resources utilization. Yet, due to data and device heterogeneity, the training bias and model staleness dramatically downgrade the model performance. This paper presents KAFL, a fast-K Asynchronous Federated Learning framework, to improve the system and statistical efficiency. KAFL allows the global server to iteratively collect and aggregate (1) the parameters uploaded by the fastest K edge clients (K-FedAsync); or (2) the first M updated parameters sent from any clients (Mstep-FedAsync). Compared to the fully asynchronous setting, KAFL helps the server obtain a better direction toward the global optima as it collects the information from at least K clients or M parameters. To further improve the convergence speed of KAFL, we propose a new weighted aggregation method which dynamically adjusts the aggregation weights according to the weight deviation matrix and client contribution frequency. Experimental results show that KAFL achieves a significant time-to-target-accuracy speedup on both IID and Non-IID datasets. To achieve the same model accuracy, KAFL reduces more than 50% training time for five CNN and RNN models, demonstrating the high training efficiency of our proposed framework.\",\"PeriodicalId\":225883,\"journal\":{\"name\":\"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)\",\"volume\":\"86 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDCS54860.2022.00089\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS54860.2022.00089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

联邦平均法（FedAvg）及其变体是联邦学习（FL）中普遍采用的优化算法，因为它们显示出良好的模型收敛性。然而，这类优化方法大多以同步方式运行，存在 "散兵游勇"（standggler）问题，尤其是在现实世界的 FL 场景中。联盟学习涉及大量连接到间歇性网络的资源薄弱的边缘设备，呈现出巨大的异构训练环境。异步设置是一种合理的资源利用解决方案。然而，由于数据和设备的异构性，训练偏差和模型僵化会大大降低模型性能。本文提出了快速异步联合学习框架 KAFL，以提高系统和统计效率。KAFL 允许全局服务器迭代收集和汇总（1）最快的 K 个边缘客户端上传的参数（K-FedAsync）；或（2）任何客户端发送的前 M 个更新参数（Mstep-FedAsync）。与完全异步设置相比，KAFL 可以帮助服务器获得更好的全局最优方向，因为它至少收集了 K 个客户端或 M 个参数的信息。为了进一步提高 KAFL 的收敛速度，我们提出了一种新的加权聚合方法，该方法可根据权重偏差矩阵和客户端贡献频率动态调整聚合权重。实验结果表明，KAFL 在 IID 数据集和非 IID 数据集上都实现了显著的目标准确率加速。为了达到相同的模型精度，KAFL 为五个 CNN 和 RNN 模型减少了 50% 以上的训练时间，这表明我们提出的框架具有很高的训练效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

KAFL: Achieving High Training Efficiency for Fast-K Asynchronous Federated Learning

Federated Averaging (FedAvg) and its variants are prevalent optimization algorithms adopted in Federated Learning (FL) as they show good model convergence. However, such optimization methods are mostly running in a synchronous flavor which is plagued by the straggler problem, especially in the real-world FL scenario. Federated learning involves a massive number of resource-weak edge devices connected to the intermittent networks, exhibiting a vastly heterogeneous training environment. The asynchronous setting is a plausible solution to fulfill the resources utilization. Yet, due to data and device heterogeneity, the training bias and model staleness dramatically downgrade the model performance. This paper presents KAFL, a fast-K Asynchronous Federated Learning framework, to improve the system and statistical efficiency. KAFL allows the global server to iteratively collect and aggregate (1) the parameters uploaded by the fastest K edge clients (K-FedAsync); or (2) the first M updated parameters sent from any clients (Mstep-FedAsync). Compared to the fully asynchronous setting, KAFL helps the server obtain a better direction toward the global optima as it collects the information from at least K clients or M parameters. To further improve the convergence speed of KAFL, we propose a new weighted aggregation method which dynamically adjusts the aggregation weights according to the weight deviation matrix and client contribution frequency. Experimental results show that KAFL achieves a significant time-to-target-accuracy speedup on both IID and Non-IID datasets. To achieve the same model accuracy, KAFL reduces more than 50% training time for five CNN and RNN models, demonstrating the high training efficiency of our proposed framework.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)

自引率

0.00%

发文量