Chenhao Wang;Zihan Chen;Nikolaos Pappas;Howard H. Yang;Tony Q. S. Quek;H. Vincent Poor
{"title":"空中自适应联邦学习","authors":"Chenhao Wang;Zihan Chen;Nikolaos Pappas;Howard H. Yang;Tony Q. S. Quek;H. Vincent Poor","doi":"10.1109/TSP.2025.3585002","DOIUrl":null,"url":null,"abstract":"We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. This approach capitalizes on the inherent superposition property of wireless channels, facilitating fast and scalable parameter aggregation. Meanwhile, it enhances the robustness of the model training process by dynamically adjusting the stepsize in accordance with the global gradient update. We derive the convergence rate of the training algorithms for a broad spectrum of nonconvex loss functions, encompassing the effects of channel fading, and interference that follows a heavy-tailed distribution. Our analysis shows that the AdaGrad-based algorithm converges to a stationary point at the rate of <inline-formula><tex-math>$\\mathcal{O}(\\ln{(T)}/{T^{1-\\frac{1}{\\alpha}}})$</tex-math></inline-formula>, where <inline-formula><tex-math>$\\alpha$</tex-math></inline-formula> represents the tail index of the electromagnetic interference. This result indicates that the level of heavy-tailedness in interference distribution plays a crucial role in the training efficiency: the heavier the tail, the slower the algorithm converges. In contrast, an Adam-like algorithm converges at the <inline-formula><tex-math>$\\mathcal{O}(1/T)$</tex-math></inline-formula> rate, demonstrating its advantage in expediting the model training process. We conduct extensive experiments that corroborate our theoretical findings and affirm the practical efficacy of our proposed federated adaptive gradient methods.","PeriodicalId":13330,"journal":{"name":"IEEE Transactions on Signal Processing","volume":"73 ","pages":"3187-3202"},"PeriodicalIF":5.8000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive Federated Learning Over the Air\",\"authors\":\"Chenhao Wang;Zihan Chen;Nikolaos Pappas;Howard H. Yang;Tony Q. S. Quek;H. Vincent Poor\",\"doi\":\"10.1109/TSP.2025.3585002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. This approach capitalizes on the inherent superposition property of wireless channels, facilitating fast and scalable parameter aggregation. Meanwhile, it enhances the robustness of the model training process by dynamically adjusting the stepsize in accordance with the global gradient update. We derive the convergence rate of the training algorithms for a broad spectrum of nonconvex loss functions, encompassing the effects of channel fading, and interference that follows a heavy-tailed distribution. Our analysis shows that the AdaGrad-based algorithm converges to a stationary point at the rate of <inline-formula><tex-math>$\\\\mathcal{O}(\\\\ln{(T)}/{T^{1-\\\\frac{1}{\\\\alpha}}})$</tex-math></inline-formula>, where <inline-formula><tex-math>$\\\\alpha$</tex-math></inline-formula> represents the tail index of the electromagnetic interference. This result indicates that the level of heavy-tailedness in interference distribution plays a crucial role in the training efficiency: the heavier the tail, the slower the algorithm converges. In contrast, an Adam-like algorithm converges at the <inline-formula><tex-math>$\\\\mathcal{O}(1/T)$</tex-math></inline-formula> rate, demonstrating its advantage in expediting the model training process. We conduct extensive experiments that corroborate our theoretical findings and affirm the practical efficacy of our proposed federated adaptive gradient methods.\",\"PeriodicalId\":13330,\"journal\":{\"name\":\"IEEE Transactions on Signal Processing\",\"volume\":\"73 \",\"pages\":\"3187-3202\"},\"PeriodicalIF\":5.8000,\"publicationDate\":\"2025-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11079930/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11079930/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. This approach capitalizes on the inherent superposition property of wireless channels, facilitating fast and scalable parameter aggregation. Meanwhile, it enhances the robustness of the model training process by dynamically adjusting the stepsize in accordance with the global gradient update. We derive the convergence rate of the training algorithms for a broad spectrum of nonconvex loss functions, encompassing the effects of channel fading, and interference that follows a heavy-tailed distribution. Our analysis shows that the AdaGrad-based algorithm converges to a stationary point at the rate of $\mathcal{O}(\ln{(T)}/{T^{1-\frac{1}{\alpha}}})$, where $\alpha$ represents the tail index of the electromagnetic interference. This result indicates that the level of heavy-tailedness in interference distribution plays a crucial role in the training efficiency: the heavier the tail, the slower the algorithm converges. In contrast, an Adam-like algorithm converges at the $\mathcal{O}(1/T)$ rate, demonstrating its advantage in expediting the model training process. We conduct extensive experiments that corroborate our theoretical findings and affirm the practical efficacy of our proposed federated adaptive gradient methods.
期刊介绍:
The IEEE Transactions on Signal Processing covers novel theory, algorithms, performance analyses and applications of techniques for the processing, understanding, learning, retrieval, mining, and extraction of information from signals. The term “signal” includes, among others, audio, video, speech, image, communication, geophysical, sonar, radar, medical and musical signals. Examples of topics of interest include, but are not limited to, information processing and the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals.