Zhongyuan Zhao;Kailei Xu;Wei Hong;Mugen Peng;Zhiguo Ding;Tony Q. S. Quek;Howard H. Yang
{"title":"Model Pruning for Distributed Learning Over the Air","authors":"Zhongyuan Zhao;Kailei Xu;Wei Hong;Mugen Peng;Zhiguo Ding;Tony Q. S. Quek;Howard H. Yang","doi":"10.1109/TSP.2024.3486169","DOIUrl":null,"url":null,"abstract":"Analog over-the-air (A-OTA) computing is an effective approach to achieving distributed learning among multiple end-user devices within a bandwidth-constrained spectrum. In this paradigm, users’ intermediate parameters, such as gradients, are modulated onto a set of common waveforms and concurrently transmitted to the parameter server. Benefiting from the superposition property of multi-access channels, the server can obtain an automatically aggregated global gradient from the received signal without decoding individual user's information. Nonetheless, the scarcity of orthogonal waveforms constrains such a paradigm from adopting complex deep learning models. In this paper, we develop model pruning strategies for A-OTA distributed learning, balancing the tradeoff between communication efficiency and learning performance. Specifically, we design an importance measure to evaluate the contribution of each entry in the model parameter based on the noisy aggregated gradient introduced by A-OTA computing. We also derive an analytical expression for the training error bound, which shows that the proposed scheme can converge even when the aggregated gradient is corrupted by heavy-tailed interference with unbounded variance. We further improve the developed algorithm by incorporating the momentum method to (a) enhance the design of the importance measure and (b) accelerate the model convergence rate. Extensive experiments are conducted to validate the performance gains achieved by our proposed scheme and verify the correctness of analytical results.","PeriodicalId":13330,"journal":{"name":"IEEE Transactions on Signal Processing","volume":"72 ","pages":"5533-5549"},"PeriodicalIF":4.6000,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10734153/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Analog over-the-air (A-OTA) computing is an effective approach to achieving distributed learning among multiple end-user devices within a bandwidth-constrained spectrum. In this paradigm, users’ intermediate parameters, such as gradients, are modulated onto a set of common waveforms and concurrently transmitted to the parameter server. Benefiting from the superposition property of multi-access channels, the server can obtain an automatically aggregated global gradient from the received signal without decoding individual user's information. Nonetheless, the scarcity of orthogonal waveforms constrains such a paradigm from adopting complex deep learning models. In this paper, we develop model pruning strategies for A-OTA distributed learning, balancing the tradeoff between communication efficiency and learning performance. Specifically, we design an importance measure to evaluate the contribution of each entry in the model parameter based on the noisy aggregated gradient introduced by A-OTA computing. We also derive an analytical expression for the training error bound, which shows that the proposed scheme can converge even when the aggregated gradient is corrupted by heavy-tailed interference with unbounded variance. We further improve the developed algorithm by incorporating the momentum method to (a) enhance the design of the importance measure and (b) accelerate the model convergence rate. Extensive experiments are conducted to validate the performance gains achieved by our proposed scheme and verify the correctness of analytical results.
期刊介绍:
The IEEE Transactions on Signal Processing covers novel theory, algorithms, performance analyses and applications of techniques for the processing, understanding, learning, retrieval, mining, and extraction of information from signals. The term “signal” includes, among others, audio, video, speech, image, communication, geophysical, sonar, radar, medical and musical signals. Examples of topics of interest include, but are not limited to, information processing and the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals.