基于梯度量化和双方差约简的随机分布式学习

Samuel Horváth, D. Kovalev, Konstantin Mishchenko, Peter Richtárik, S. Stich
{"title":"基于梯度量化和双方差约简的随机分布式学习","authors":"Samuel Horváth, D. Kovalev, Konstantin Mishchenko, Peter Richtárik, S. Stich","doi":"10.1080/10556788.2022.2117355","DOIUrl":null,"url":null,"abstract":"ABSTRACT We consider distributed optimization over several devices, each sending incremental model updates to a central server. This setting is considered, for instance, in federated learning. Various schemes have been designed to compress the model updates in order to reduce the overall communication cost. However, existing methods suffer from a significant slowdown due to additional variance coming from the compression operator and as a result, only converge sublinearly. What is needed is a variance reduction technique for taming the variance introduced by compression. We propose the first methods that achieve linear convergence for arbitrary compression operators. For strongly convex functions with condition number κ, distributed among n machines with a finite-sum structure, each worker having less than m components, we also (i) give analysis for the weakly convex and the non-convex cases and (ii) verify in experiments that our novel variance reduced schemes are more efficient than the baselines. Moreover, we show theoretically that as the number of devices increases, higher compression levels are possible without this affecting the overall number of communications in comparison with methods that do not perform any compression. This leads to a significant reduction in communication cost. Our general analysis allows to pick the most suitable compression for each problem, finding the right balance between additional variance and communication savings. Finally, we also (iii) give analysis for arbitrary quantized updates.","PeriodicalId":124811,"journal":{"name":"Optimization Methods and Software","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Stochastic distributed learning with gradient quantization and double-variance reduction\",\"authors\":\"Samuel Horváth, D. Kovalev, Konstantin Mishchenko, Peter Richtárik, S. Stich\",\"doi\":\"10.1080/10556788.2022.2117355\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT We consider distributed optimization over several devices, each sending incremental model updates to a central server. This setting is considered, for instance, in federated learning. Various schemes have been designed to compress the model updates in order to reduce the overall communication cost. However, existing methods suffer from a significant slowdown due to additional variance coming from the compression operator and as a result, only converge sublinearly. What is needed is a variance reduction technique for taming the variance introduced by compression. We propose the first methods that achieve linear convergence for arbitrary compression operators. For strongly convex functions with condition number κ, distributed among n machines with a finite-sum structure, each worker having less than m components, we also (i) give analysis for the weakly convex and the non-convex cases and (ii) verify in experiments that our novel variance reduced schemes are more efficient than the baselines. Moreover, we show theoretically that as the number of devices increases, higher compression levels are possible without this affecting the overall number of communications in comparison with methods that do not perform any compression. This leads to a significant reduction in communication cost. Our general analysis allows to pick the most suitable compression for each problem, finding the right balance between additional variance and communication savings. Finally, we also (iii) give analysis for arbitrary quantized updates.\",\"PeriodicalId\":124811,\"journal\":{\"name\":\"Optimization Methods and Software\",\"volume\":\"128 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Optimization Methods and Software\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/10556788.2022.2117355\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optimization Methods and Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/10556788.2022.2117355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

摘要

我们考虑在多个设备上进行分布式优化,每个设备向中央服务器发送增量模型更新。例如,在联邦学习中就考虑了这种设置。为了降低整体通信成本,设计了各种压缩模型更新的方案。然而,由于来自压缩算子的额外方差,现有方法的速度明显减慢,因此只能进行次线性收敛。所需要的是一种方差减少技术,以驯服由压缩引入的方差。我们提出了第一个实现任意压缩算子线性收敛的方法。对于条件数为κ的强凸函数,分布在n台具有有限和结构的机器中,每个工人具有少于m个组件,我们还(i)给出了弱凸和非凸情况的分析,(ii)在实验中验证了我们的新方差减少方案比基线更有效。此外,我们从理论上表明,随着设备数量的增加,与不执行任何压缩的方法相比,更高的压缩级别可能不会影响通信的总数。这大大降低了通信成本。我们的一般分析允许为每个问题选择最合适的压缩,在额外方差和通信节省之间找到适当的平衡。最后,我们还(iii)给出了任意量化更新的分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Stochastic distributed learning with gradient quantization and double-variance reduction
ABSTRACT We consider distributed optimization over several devices, each sending incremental model updates to a central server. This setting is considered, for instance, in federated learning. Various schemes have been designed to compress the model updates in order to reduce the overall communication cost. However, existing methods suffer from a significant slowdown due to additional variance coming from the compression operator and as a result, only converge sublinearly. What is needed is a variance reduction technique for taming the variance introduced by compression. We propose the first methods that achieve linear convergence for arbitrary compression operators. For strongly convex functions with condition number κ, distributed among n machines with a finite-sum structure, each worker having less than m components, we also (i) give analysis for the weakly convex and the non-convex cases and (ii) verify in experiments that our novel variance reduced schemes are more efficient than the baselines. Moreover, we show theoretically that as the number of devices increases, higher compression levels are possible without this affecting the overall number of communications in comparison with methods that do not perform any compression. This leads to a significant reduction in communication cost. Our general analysis allows to pick the most suitable compression for each problem, finding the right balance between additional variance and communication savings. Finally, we also (iii) give analysis for arbitrary quantized updates.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信