On the Convergence of Hybrid Federated Learning with Server-Clients Collaborative Training

2022 56th Annual Conference on Information Sciences and Systems (CISS) Pub Date : 2022-03-09 DOI:10.1109/CISS53076.2022.9751161

Kun Yang, Cong Shen

{"title":"On the Convergence of Hybrid Federated Learning with Server-Clients Collaborative Training","authors":"Kun Yang, Cong Shen","doi":"10.1109/CISS53076.2022.9751161","DOIUrl":null,"url":null,"abstract":"State-of-the-art federated learning (FL) paradigms utilize data collected and stored in massively distributed clients to train a global machine learning (ML) model, in which local datasets never leave the devices and the server performs simple model aggregation for better privacy protection. In reality, however, the parameter server often has access to certain (possibly small) amount of data, and it is computationally more powerful than the clients. This work focuses on analyzing the convergence behavior of hybrid federated learning that leverages the server dataset and its computation power for collaborative model training. Different from standard FL where stochastic gradient descent (SGD) is always computed in a parallel fashion across all clients, this architecture enjoys both parallel SGD at clients and sequential SGD at the server, by using the aggregated model from clients as a new starting point for server SGD. The main contribution of this work is the convergence rate upper bounds of this aggregate-then-advance hybrid FL design. In particular, when the local SGD keeps an $\\mathcal{O}(1/t)$ stepsize, the server SGD must adjust its stepsize to scale no slower than $\\mathcal{O}(1/t^{2})$ to strictly outperform local SGD with strongly convex loss functions. Numerical experiments are carried out using standard FL tasks, where the accuracy and convergence rate advantages over clients-only (FEDAVG) and server-only training are demonstrated.","PeriodicalId":305918,"journal":{"name":"2022 56th Annual Conference on Information Sciences and Systems (CISS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 56th Annual Conference on Information Sciences and Systems (CISS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISS53076.2022.9751161","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

State-of-the-art federated learning (FL) paradigms utilize data collected and stored in massively distributed clients to train a global machine learning (ML) model, in which local datasets never leave the devices and the server performs simple model aggregation for better privacy protection. In reality, however, the parameter server often has access to certain (possibly small) amount of data, and it is computationally more powerful than the clients. This work focuses on analyzing the convergence behavior of hybrid federated learning that leverages the server dataset and its computation power for collaborative model training. Different from standard FL where stochastic gradient descent (SGD) is always computed in a parallel fashion across all clients, this architecture enjoys both parallel SGD at clients and sequential SGD at the server, by using the aggregated model from clients as a new starting point for server SGD. The main contribution of this work is the convergence rate upper bounds of this aggregate-then-advance hybrid FL design. In particular, when the local SGD keeps an $\mathcal{O}(1/t)$ stepsize, the server SGD must adjust its stepsize to scale no slower than $\mathcal{O}(1/t^{2})$ to strictly outperform local SGD with strongly convex loss functions. Numerical experiments are carried out using standard FL tasks, where the accuracy and convergence rate advantages over clients-only (FEDAVG) and server-only training are demonstrated.

查看原文本刊更多论文

混合联邦学习与服务器-客户端协同训练的融合研究

最先进的联邦学习(FL)范例利用收集和存储在大规模分布式客户端的数据来训练全局机器学习(ML)模型，其中本地数据集永远不会离开设备，服务器执行简单的模型聚合以更好地保护隐私。然而，在现实中，参数服务器通常可以访问一定数量(可能很少)的数据，并且它在计算上比客户端更强大。这项工作的重点是分析混合联邦学习的收敛行为，该学习利用服务器数据集及其计算能力进行协作模型训练。与标准FL不同的是，在标准FL中，随机梯度下降(SGD)总是在所有客户机上以并行方式计算，而这种体系结构通过使用来自客户机的聚合模型作为服务器SGD的新起点，在客户机上享受并行SGD，在服务器上享受顺序SGD。这项工作的主要贡献是该聚合-超前混合FL设计的收敛速率上界。特别是，当本地SGD保持$\mathcal{O}(1/t)$步长时，服务器SGD必须调整其步长，使其缩放速度不低于$\mathcal{O}(1/t^{2})$，以严格优于具有强凸损失函数的本地SGD。使用标准的FL任务进行了数值实验，其中证明了准确度和收敛速度优于仅客户端(FEDAVG)和仅服务器训练。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 56th Annual Conference on Information Sciences and Systems (CISS)

自引率

0.00%

发文量