{"title":"On the Convergence of Hybrid Federated Learning with Server-Clients Collaborative Training","authors":"Kun Yang, Cong Shen","doi":"10.1109/CISS53076.2022.9751161","DOIUrl":null,"url":null,"abstract":"State-of-the-art federated learning (FL) paradigms utilize data collected and stored in massively distributed clients to train a global machine learning (ML) model, in which local datasets never leave the devices and the server performs simple model aggregation for better privacy protection. In reality, however, the parameter server often has access to certain (possibly small) amount of data, and it is computationally more powerful than the clients. This work focuses on analyzing the convergence behavior of hybrid federated learning that leverages the server dataset and its computation power for collaborative model training. Different from standard FL where stochastic gradient descent (SGD) is always computed in a parallel fashion across all clients, this architecture enjoys both parallel SGD at clients and sequential SGD at the server, by using the aggregated model from clients as a new starting point for server SGD. The main contribution of this work is the convergence rate upper bounds of this aggregate-then-advance hybrid FL design. In particular, when the local SGD keeps an $\\mathcal{O}(1/t)$ stepsize, the server SGD must adjust its stepsize to scale no slower than $\\mathcal{O}(1/t^{2})$ to strictly outperform local SGD with strongly convex loss functions. Numerical experiments are carried out using standard FL tasks, where the accuracy and convergence rate advantages over clients-only (FEDAVG) and server-only training are demonstrated.","PeriodicalId":305918,"journal":{"name":"2022 56th Annual Conference on Information Sciences and Systems (CISS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 56th Annual Conference on Information Sciences and Systems (CISS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISS53076.2022.9751161","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
State-of-the-art federated learning (FL) paradigms utilize data collected and stored in massively distributed clients to train a global machine learning (ML) model, in which local datasets never leave the devices and the server performs simple model aggregation for better privacy protection. In reality, however, the parameter server often has access to certain (possibly small) amount of data, and it is computationally more powerful than the clients. This work focuses on analyzing the convergence behavior of hybrid federated learning that leverages the server dataset and its computation power for collaborative model training. Different from standard FL where stochastic gradient descent (SGD) is always computed in a parallel fashion across all clients, this architecture enjoys both parallel SGD at clients and sequential SGD at the server, by using the aggregated model from clients as a new starting point for server SGD. The main contribution of this work is the convergence rate upper bounds of this aggregate-then-advance hybrid FL design. In particular, when the local SGD keeps an $\mathcal{O}(1/t)$ stepsize, the server SGD must adjust its stepsize to scale no slower than $\mathcal{O}(1/t^{2})$ to strictly outperform local SGD with strongly convex loss functions. Numerical experiments are carried out using standard FL tasks, where the accuracy and convergence rate advantages over clients-only (FEDAVG) and server-only training are demonstrated.