Nested Distributed Gradient Methods with Stochastic Computation Errors

Charikleia Iakovidou, Ermin Wei
{"title":"Nested Distributed Gradient Methods with Stochastic Computation Errors","authors":"Charikleia Iakovidou, Ermin Wei","doi":"10.1109/ALLERTON.2019.8919853","DOIUrl":null,"url":null,"abstract":"In this work, we consider the problem of a network of agents collectively minimizing a sum of convex functions. The agents in our setting can only access their local objective functions and exchange information with their immediate neighbors. Motivated by applications where computation is imperfect, including, but not limited to, empirical risk minimization (ERM) and online learning, we assume that only noisy estimates of the local gradients are available. To tackle this problem, we adapt a class of Nested Distributed Gradient methods (NEAR-DGD) to the stochastic gradient setting. These methods have minimal storage requirements, are communication aware and perform well in settings where gradient computation is costly, while communication is relatively inexpensive. We investigate the convergence properties of our method under standard assumptions for stochastic gradients, i.e. unbiasedness and bounded variance. Our analysis indicates that our method converges to a neighborhood of the optimal solution with a linear rate for local strongly convex functions and appropriate constant steplengths. We also show that distributed optimization with stochastic gradients achieves a noise reduction effect similar to mini-batching, which scales favorably with network size. Finally, we present numerical results to demonstrate the effectiveness of our method.","PeriodicalId":120479,"journal":{"name":"2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ALLERTON.2019.8919853","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In this work, we consider the problem of a network of agents collectively minimizing a sum of convex functions. The agents in our setting can only access their local objective functions and exchange information with their immediate neighbors. Motivated by applications where computation is imperfect, including, but not limited to, empirical risk minimization (ERM) and online learning, we assume that only noisy estimates of the local gradients are available. To tackle this problem, we adapt a class of Nested Distributed Gradient methods (NEAR-DGD) to the stochastic gradient setting. These methods have minimal storage requirements, are communication aware and perform well in settings where gradient computation is costly, while communication is relatively inexpensive. We investigate the convergence properties of our method under standard assumptions for stochastic gradients, i.e. unbiasedness and bounded variance. Our analysis indicates that our method converges to a neighborhood of the optimal solution with a linear rate for local strongly convex functions and appropriate constant steplengths. We also show that distributed optimization with stochastic gradients achieves a noise reduction effect similar to mini-batching, which scales favorably with network size. Finally, we present numerical results to demonstrate the effectiveness of our method.
具有随机计算误差的嵌套分布梯度方法
在这项工作中,我们考虑一个智能体网络的问题,这些智能体集体最小化凸函数的总和。在我们的设置中,智能体只能访问它们的局部目标函数,并与它们的近邻交换信息。由于计算不完善的应用,包括但不限于经验风险最小化(ERM)和在线学习,我们假设只有局部梯度的噪声估计可用。为了解决这个问题,我们将一类嵌套分布梯度方法(NEAR-DGD)应用于随机梯度设置。这些方法具有最小的存储需求,具有通信意识,并且在梯度计算成本高而通信相对便宜的环境中表现良好。研究了该方法在随机梯度标准假设下的收敛性,即无偏性和有界方差。我们的分析表明,我们的方法收敛到局部强凸函数的最优解的一个邻域,具有线性速率和适当的常数步长。我们还表明,使用随机梯度的分布式优化实现了类似于mini-batch的降噪效果,并且随着网络规模的扩大而扩大。最后,给出了数值结果来验证该方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信