SGD 中的自适应 Top-K,实现多机器人协作中高通信效率的分布式学习

IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Mengzhe Ruan;Guangfeng Yan;Yuanzhang Xiao;Linqi Song;Weitao Xu
{"title":"SGD 中的自适应 Top-K,实现多机器人协作中高通信效率的分布式学习","authors":"Mengzhe Ruan;Guangfeng Yan;Yuanzhang Xiao;Linqi Song;Weitao Xu","doi":"10.1109/JSTSP.2024.3381373","DOIUrl":null,"url":null,"abstract":"Distributed stochastic gradient descent (D-SGD) with gradient compression has become a popular communication-efficient solution for accelerating optimization procedures in distributed learning systems like multi-robot systems. One commonly used method for gradient compression is Top-K sparsification, which sparsifies the gradients by a fixed degree during model training. However, there has been a lack of an adaptive approach with a systematic treatment and analysis to adjust the sparsification degree to maximize the potential of the model's performance or training speed. This paper proposes a novel adaptive Top-K in Stochastic Gradient Descent framework that enables an adaptive degree of sparsification for each gradient descent step to optimize the convergence performance by balancing the trade-off between communication cost and convergence error with respect to the norm of gradients and the communication budget. Firstly, an upper bound of convergence error is derived for the adaptive sparsification scheme and the loss function. Secondly, we consider communication budget constraints and propose an optimization formulation for minimizing the deep model's convergence error under such constraints. We obtain an enhanced compression algorithm that significantly improves model accuracy under given communication budget constraints. Finally, we conduct numerical experiments on general image classification tasks using the MNIST, CIFAR-10 datasets. For the multi-robot collaboration tasks, we choose the object detection task on the PASCAL VOC dataset. The results demonstrate that the proposed adaptive Top-K algorithm in SGD achieves a significantly better convergence rate compared to state-of-the-art methods, even after considering error compensation.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"487-501"},"PeriodicalIF":8.7000,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive Top-K in SGD for Communication-Efficient Distributed Learning in Multi-Robot Collaboration\",\"authors\":\"Mengzhe Ruan;Guangfeng Yan;Yuanzhang Xiao;Linqi Song;Weitao Xu\",\"doi\":\"10.1109/JSTSP.2024.3381373\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distributed stochastic gradient descent (D-SGD) with gradient compression has become a popular communication-efficient solution for accelerating optimization procedures in distributed learning systems like multi-robot systems. One commonly used method for gradient compression is Top-K sparsification, which sparsifies the gradients by a fixed degree during model training. However, there has been a lack of an adaptive approach with a systematic treatment and analysis to adjust the sparsification degree to maximize the potential of the model's performance or training speed. This paper proposes a novel adaptive Top-K in Stochastic Gradient Descent framework that enables an adaptive degree of sparsification for each gradient descent step to optimize the convergence performance by balancing the trade-off between communication cost and convergence error with respect to the norm of gradients and the communication budget. Firstly, an upper bound of convergence error is derived for the adaptive sparsification scheme and the loss function. Secondly, we consider communication budget constraints and propose an optimization formulation for minimizing the deep model's convergence error under such constraints. We obtain an enhanced compression algorithm that significantly improves model accuracy under given communication budget constraints. Finally, we conduct numerical experiments on general image classification tasks using the MNIST, CIFAR-10 datasets. For the multi-robot collaboration tasks, we choose the object detection task on the PASCAL VOC dataset. The results demonstrate that the proposed adaptive Top-K algorithm in SGD achieves a significantly better convergence rate compared to state-of-the-art methods, even after considering error compensation.\",\"PeriodicalId\":13038,\"journal\":{\"name\":\"IEEE Journal of Selected Topics in Signal Processing\",\"volume\":\"18 3\",\"pages\":\"487-501\"},\"PeriodicalIF\":8.7000,\"publicationDate\":\"2024-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal of Selected Topics in Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10493123/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Selected Topics in Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10493123/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

带有梯度压缩功能的分布式随机梯度下降(D-SGD)已成为在多机器人系统等分布式学习系统中加速优化程序的一种流行的通信高效解决方案。一种常用的梯度压缩方法是 Top-K sparsification,该方法在模型训练过程中按固定程度稀疏梯度。然而,目前还缺乏一种系统处理和分析的自适应方法来调整稀疏程度,以最大限度地发挥模型的性能潜力或训练速度。本文在随机梯度下降框架中提出了一种新的自适应 Top-K,通过平衡通信成本和收敛误差与梯度准则和通信预算之间的权衡,为每个梯度下降步骤提供自适应的稀疏化程度,以优化收敛性能。首先,针对自适应稀疏化方案和损失函数推导出收敛误差的上限。其次,我们考虑了通信预算约束,并提出了在这种约束下最小化深度模型收敛误差的优化方案。我们得到了一种增强型压缩算法,它能在给定的通信预算约束条件下显著提高模型精度。最后,我们使用 MNIST 和 CIFAR-10 数据集对一般图像分类任务进行了数值实验。对于多机器人协作任务,我们选择了 PASCAL VOC 数据集上的物体检测任务。结果表明,与最先进的方法相比,即使考虑了误差补偿,SGD 中提出的自适应 Top-K 算法的收敛率也明显更高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Adaptive Top-K in SGD for Communication-Efficient Distributed Learning in Multi-Robot Collaboration
Distributed stochastic gradient descent (D-SGD) with gradient compression has become a popular communication-efficient solution for accelerating optimization procedures in distributed learning systems like multi-robot systems. One commonly used method for gradient compression is Top-K sparsification, which sparsifies the gradients by a fixed degree during model training. However, there has been a lack of an adaptive approach with a systematic treatment and analysis to adjust the sparsification degree to maximize the potential of the model's performance or training speed. This paper proposes a novel adaptive Top-K in Stochastic Gradient Descent framework that enables an adaptive degree of sparsification for each gradient descent step to optimize the convergence performance by balancing the trade-off between communication cost and convergence error with respect to the norm of gradients and the communication budget. Firstly, an upper bound of convergence error is derived for the adaptive sparsification scheme and the loss function. Secondly, we consider communication budget constraints and propose an optimization formulation for minimizing the deep model's convergence error under such constraints. We obtain an enhanced compression algorithm that significantly improves model accuracy under given communication budget constraints. Finally, we conduct numerical experiments on general image classification tasks using the MNIST, CIFAR-10 datasets. For the multi-robot collaboration tasks, we choose the object detection task on the PASCAL VOC dataset. The results demonstrate that the proposed adaptive Top-K algorithm in SGD achieves a significantly better convergence rate compared to state-of-the-art methods, even after considering error compensation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Journal of Selected Topics in Signal Processing
IEEE Journal of Selected Topics in Signal Processing 工程技术-工程:电子与电气
CiteScore
19.00
自引率
1.30%
发文量
135
审稿时长
3 months
期刊介绍: The IEEE Journal of Selected Topics in Signal Processing (JSTSP) focuses on the Field of Interest of the IEEE Signal Processing Society, which encompasses the theory and application of various signal processing techniques. These techniques include filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals using digital or analog devices. The term "signal" covers a wide range of data types, including audio, video, speech, image, communication, geophysical, sonar, radar, medical, musical, and others. The journal format allows for in-depth exploration of signal processing topics, enabling the Society to cover both established and emerging areas. This includes interdisciplinary fields such as biomedical engineering and language processing, as well as areas not traditionally associated with engineering.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信