空中自适应联邦学习

IF 5.8 2区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Chenhao Wang;Zihan Chen;Nikolaos Pappas;Howard H. Yang;Tony Q. S. Quek;H. Vincent Poor
{"title":"空中自适应联邦学习","authors":"Chenhao Wang;Zihan Chen;Nikolaos Pappas;Howard H. Yang;Tony Q. S. Quek;H. Vincent Poor","doi":"10.1109/TSP.2025.3585002","DOIUrl":null,"url":null,"abstract":"We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. This approach capitalizes on the inherent superposition property of wireless channels, facilitating fast and scalable parameter aggregation. Meanwhile, it enhances the robustness of the model training process by dynamically adjusting the stepsize in accordance with the global gradient update. We derive the convergence rate of the training algorithms for a broad spectrum of nonconvex loss functions, encompassing the effects of channel fading, and interference that follows a heavy-tailed distribution. Our analysis shows that the AdaGrad-based algorithm converges to a stationary point at the rate of <inline-formula><tex-math>$\\mathcal{O}(\\ln{(T)}/{T^{1-\\frac{1}{\\alpha}}})$</tex-math></inline-formula>, where <inline-formula><tex-math>$\\alpha$</tex-math></inline-formula> represents the tail index of the electromagnetic interference. This result indicates that the level of heavy-tailedness in interference distribution plays a crucial role in the training efficiency: the heavier the tail, the slower the algorithm converges. In contrast, an Adam-like algorithm converges at the <inline-formula><tex-math>$\\mathcal{O}(1/T)$</tex-math></inline-formula> rate, demonstrating its advantage in expediting the model training process. We conduct extensive experiments that corroborate our theoretical findings and affirm the practical efficacy of our proposed federated adaptive gradient methods.","PeriodicalId":13330,"journal":{"name":"IEEE Transactions on Signal Processing","volume":"73 ","pages":"3187-3202"},"PeriodicalIF":5.8000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive Federated Learning Over the Air\",\"authors\":\"Chenhao Wang;Zihan Chen;Nikolaos Pappas;Howard H. Yang;Tony Q. S. Quek;H. Vincent Poor\",\"doi\":\"10.1109/TSP.2025.3585002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. This approach capitalizes on the inherent superposition property of wireless channels, facilitating fast and scalable parameter aggregation. Meanwhile, it enhances the robustness of the model training process by dynamically adjusting the stepsize in accordance with the global gradient update. We derive the convergence rate of the training algorithms for a broad spectrum of nonconvex loss functions, encompassing the effects of channel fading, and interference that follows a heavy-tailed distribution. Our analysis shows that the AdaGrad-based algorithm converges to a stationary point at the rate of <inline-formula><tex-math>$\\\\mathcal{O}(\\\\ln{(T)}/{T^{1-\\\\frac{1}{\\\\alpha}}})$</tex-math></inline-formula>, where <inline-formula><tex-math>$\\\\alpha$</tex-math></inline-formula> represents the tail index of the electromagnetic interference. This result indicates that the level of heavy-tailedness in interference distribution plays a crucial role in the training efficiency: the heavier the tail, the slower the algorithm converges. In contrast, an Adam-like algorithm converges at the <inline-formula><tex-math>$\\\\mathcal{O}(1/T)$</tex-math></inline-formula> rate, demonstrating its advantage in expediting the model training process. We conduct extensive experiments that corroborate our theoretical findings and affirm the practical efficacy of our proposed federated adaptive gradient methods.\",\"PeriodicalId\":13330,\"journal\":{\"name\":\"IEEE Transactions on Signal Processing\",\"volume\":\"73 \",\"pages\":\"3187-3202\"},\"PeriodicalIF\":5.8000,\"publicationDate\":\"2025-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11079930/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11079930/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

我们提出了自适应梯度方法的联邦版本,特别是AdaGrad和Adam,在空中模型训练的框架内。该方法利用无线信道固有的叠加特性,实现了快速、可扩展的参数聚合。同时,根据全局梯度更新动态调整步长,增强了模型训练过程的鲁棒性。我们推导了广谱非凸损失函数的训练算法的收敛速度,包括信道衰落的影响,以及遵循重尾分布的干扰。我们的分析表明,基于adagrad的算法以$\mathcal{O}(\ln{(T)}/{T^{1-\frac{1}{\alpha}}})$的速率收敛到一个平稳点,其中$\alpha$表示电磁干扰的尾部指数。该结果表明,干扰分布的重尾程度对训练效率起着至关重要的作用,尾越重,算法收敛速度越慢。相比之下,类adam算法的收敛速度为$\mathcal{O}(1/T)$,显示了其在加快模型训练过程方面的优势。我们进行了广泛的实验,证实了我们的理论发现,并肯定了我们提出的联邦自适应梯度方法的实际功效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Adaptive Federated Learning Over the Air
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. This approach capitalizes on the inherent superposition property of wireless channels, facilitating fast and scalable parameter aggregation. Meanwhile, it enhances the robustness of the model training process by dynamically adjusting the stepsize in accordance with the global gradient update. We derive the convergence rate of the training algorithms for a broad spectrum of nonconvex loss functions, encompassing the effects of channel fading, and interference that follows a heavy-tailed distribution. Our analysis shows that the AdaGrad-based algorithm converges to a stationary point at the rate of $\mathcal{O}(\ln{(T)}/{T^{1-\frac{1}{\alpha}}})$, where $\alpha$ represents the tail index of the electromagnetic interference. This result indicates that the level of heavy-tailedness in interference distribution plays a crucial role in the training efficiency: the heavier the tail, the slower the algorithm converges. In contrast, an Adam-like algorithm converges at the $\mathcal{O}(1/T)$ rate, demonstrating its advantage in expediting the model training process. We conduct extensive experiments that corroborate our theoretical findings and affirm the practical efficacy of our proposed federated adaptive gradient methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Signal Processing
IEEE Transactions on Signal Processing 工程技术-工程:电子与电气
CiteScore
11.20
自引率
9.30%
发文量
310
审稿时长
3.0 months
期刊介绍: The IEEE Transactions on Signal Processing covers novel theory, algorithms, performance analyses and applications of techniques for the processing, understanding, learning, retrieval, mining, and extraction of information from signals. The term “signal” includes, among others, audio, video, speech, image, communication, geophysical, sonar, radar, medical and musical signals. Examples of topics of interest include, but are not limited to, information processing and the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信