Fast Convergent Federated Learning via Decaying SGD Updates

IF 5.7 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data Pub Date : 2025-10-06 DOI:10.1109/TBDATA.2025.3618454

Md Palash Uddin;Yong Xiang;Mahmudul Hasan;Yao Zhao;Youyang Qu;Longxiang Gao

{"title":"Fast Convergent Federated Learning via Decaying SGD Updates","authors":"Md Palash Uddin;Yong Xiang;Mahmudul Hasan;Yao Zhao;Youyang Qu;Longxiang Gao","doi":"10.1109/TBDATA.2025.3618454","DOIUrl":null,"url":null,"abstract":"Federated Learning (FL), a groundbreaking approach for collaborative model training across decentralized devices, maintains data privacy while constructing a decent global machine learning model. Conventional FL methods typically demand more communication rounds to achieve convergence in non-Independent and non-Identically Distributed (non-IID) data scenarios due to their reliance on fixed Stochastic Gradient Descent (SGD) updates at each Communication Round (CR). In this paper, we introduce a novel strategy to expedite the convergence of FL models, inspired by the insights from McMahan et al.’s seminal work. We focus on FL convergence via traditional SGD decay by introducing a dynamic adjusting mechanism for local epochs and local batch size. Our method adapts the decay of SGD updates during the training process, akin to decaying learning rates in classical optimization. Particularly, by adaptively reducing local epochs and increasing local batch size using their ongoing values and the CR as the model progresses, our method enhances convergence speed without compromising accuracy, specifically by effectively addressing challenges posed by non-IID data. We provide theoretical results of the benefits of the dynamic decay of SGD updates in FL scenarios. We demonstrate our method’s consistent outperformance regarding the global model’s communication speedup and convergence behavior through comprehensive experiments.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"12 1","pages":"186-199"},"PeriodicalIF":5.7000,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11194127/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Federated Learning (FL), a groundbreaking approach for collaborative model training across decentralized devices, maintains data privacy while constructing a decent global machine learning model. Conventional FL methods typically demand more communication rounds to achieve convergence in non-Independent and non-Identically Distributed (non-IID) data scenarios due to their reliance on fixed Stochastic Gradient Descent (SGD) updates at each Communication Round (CR). In this paper, we introduce a novel strategy to expedite the convergence of FL models, inspired by the insights from McMahan et al.’s seminal work. We focus on FL convergence via traditional SGD decay by introducing a dynamic adjusting mechanism for local epochs and local batch size. Our method adapts the decay of SGD updates during the training process, akin to decaying learning rates in classical optimization. Particularly, by adaptively reducing local epochs and increasing local batch size using their ongoing values and the CR as the model progresses, our method enhances convergence speed without compromising accuracy, specifically by effectively addressing challenges posed by non-IID data. We provide theoretical results of the benefits of the dynamic decay of SGD updates in FL scenarios. We demonstrate our method’s consistent outperformance regarding the global model’s communication speedup and convergence behavior through comprehensive experiments.

查看原文本刊更多论文

基于衰减SGD更新的快速收敛联邦学习

联邦学习（FL）是一种开创性的方法，用于跨分散设备的协作模型训练，在构建体面的全球机器学习模型的同时维护数据隐私。在非独立和非同分布（非iid）数据场景下，传统的FL方法通常需要更多的通信轮来实现收敛，因为它们依赖于每次通信轮（CR）时固定的随机梯度下降（SGD）更新。在本文中，我们引入了一种新的策略来加速FL模型的收敛，灵感来自McMahan等人开创性工作的见解。通过引入局部时代和局部批大小的动态调整机制，我们重点研究了传统SGD衰减的FL收敛性。我们的方法在训练过程中适应了SGD更新的衰减，类似于经典优化中的学习率衰减。特别是，随着模型的发展，通过自适应地减少局部epoch并使用其持续值和CR增加局部batch大小，我们的方法在不影响精度的情况下提高了收敛速度，特别是通过有效地解决非iid数据带来的挑战。我们提供了在FL场景中SGD更新的动态衰减的好处的理论结果。通过全面的实验，我们证明了我们的方法在全局模型的通信加速和收敛行为方面具有一致的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Big Data Multiple-

CiteScore

11.80

自引率

2.80%

发文量

114

期刊介绍： The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.