MATCHA: Speeding Up Decentralized SGD via Matching Decomposition Sampling

2019 Sixth Indian Control Conference (ICC) Pub Date : 2019-05-23 DOI:10.1109/ICC47138.2019.9123209

Jianyu Wang, Anit Kumar Sahu, Zhouyi Yang, Gauri Joshi, S. Kar

引用次数: 122

Abstract

Decentralized stochastic gradient descent (SGD) is a promising approach to learn a machine learning model over a network of workers connected in an arbitrary topology. Although a densely-connected network topology can ensure faster convergence in terms of iterations, it incurs more communication time/delay per iteration, resulting in longer training time. In this paper, we propose a novel algorithm MATCHA to achieve a win-win in this error-runtime trade-off. MATCHA uses matching decomposition sampling of the base topology to parallelize inter-worker information exchange so as to significantly reduce communication delay. At the same time, the algorithm communicates more frequently over critical links such that it can maintain the same convergence rate as vanilla decentralized SGD. Experiments on a suite of datasets and deep neural networks validate the theoretical analysis and demonstrate the effectiveness of the proposed scheme as far as reducing communication delays is concerned.

查看原文本刊更多论文

抹茶:通过匹配分解采样加速分散SGD

分散随机梯度下降(SGD)是一种很有前途的方法，可以在任意拓扑连接的工作网络上学习机器学习模型。尽管密集连接的网络拓扑结构可以确保迭代方面更快的收敛，但每次迭代会导致更多的通信时间/延迟，从而导致更长的训练时间。在本文中，我们提出了一种新的算法MATCHA来实现这种错误和运行时权衡的双赢。MATCHA利用基本拓扑的匹配分解采样来并行化worker间的信息交换，从而显著降低通信延迟。同时，该算法在关键链路上进行更频繁的通信，使其能够保持与普通分散SGD相同的收敛速度。在一系列数据集和深度神经网络上的实验验证了理论分析，并证明了所提出方案在减少通信延迟方面的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 Sixth Indian Control Conference (ICC)

自引率

0.00%

发文量