MATCHA: Speeding Up Decentralized SGD via Matching Decomposition Sampling

Jianyu Wang, Anit Kumar Sahu, Zhouyi Yang, Gauri Joshi, S. Kar
{"title":"MATCHA: Speeding Up Decentralized SGD via Matching Decomposition Sampling","authors":"Jianyu Wang, Anit Kumar Sahu, Zhouyi Yang, Gauri Joshi, S. Kar","doi":"10.1109/ICC47138.2019.9123209","DOIUrl":null,"url":null,"abstract":"Decentralized stochastic gradient descent (SGD) is a promising approach to learn a machine learning model over a network of workers connected in an arbitrary topology. Although a densely-connected network topology can ensure faster convergence in terms of iterations, it incurs more communication time/delay per iteration, resulting in longer training time. In this paper, we propose a novel algorithm MATCHA to achieve a win-win in this error-runtime trade-off. MATCHA uses matching decomposition sampling of the base topology to parallelize inter-worker information exchange so as to significantly reduce communication delay. At the same time, the algorithm communicates more frequently over critical links such that it can maintain the same convergence rate as vanilla decentralized SGD. Experiments on a suite of datasets and deep neural networks validate the theoretical analysis and demonstrate the effectiveness of the proposed scheme as far as reducing communication delays is concerned.","PeriodicalId":231050,"journal":{"name":"2019 Sixth Indian Control Conference (ICC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"122","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Sixth Indian Control Conference (ICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICC47138.2019.9123209","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 122

Abstract

Decentralized stochastic gradient descent (SGD) is a promising approach to learn a machine learning model over a network of workers connected in an arbitrary topology. Although a densely-connected network topology can ensure faster convergence in terms of iterations, it incurs more communication time/delay per iteration, resulting in longer training time. In this paper, we propose a novel algorithm MATCHA to achieve a win-win in this error-runtime trade-off. MATCHA uses matching decomposition sampling of the base topology to parallelize inter-worker information exchange so as to significantly reduce communication delay. At the same time, the algorithm communicates more frequently over critical links such that it can maintain the same convergence rate as vanilla decentralized SGD. Experiments on a suite of datasets and deep neural networks validate the theoretical analysis and demonstrate the effectiveness of the proposed scheme as far as reducing communication delays is concerned.
抹茶:通过匹配分解采样加速分散SGD
分散随机梯度下降(SGD)是一种很有前途的方法,可以在任意拓扑连接的工作网络上学习机器学习模型。尽管密集连接的网络拓扑结构可以确保迭代方面更快的收敛,但每次迭代会导致更多的通信时间/延迟,从而导致更长的训练时间。在本文中,我们提出了一种新的算法MATCHA来实现这种错误和运行时权衡的双赢。MATCHA利用基本拓扑的匹配分解采样来并行化worker间的信息交换,从而显著降低通信延迟。同时,该算法在关键链路上进行更频繁的通信,使其能够保持与普通分散SGD相同的收敛速度。在一系列数据集和深度神经网络上的实验验证了理论分析,并证明了所提出方案在减少通信延迟方面的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信