OSP: Overlapping Computation and Communication in Parameter Server for Fast Machine Learning

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI:10.1145/3337821.3337828

Haozhao Wang, Song Guo, Ruixuan Li

引用次数: 15

Abstract

When running in Parameter Server (PS), the Distributed Stochastic Gradient Descent (SGD) incurs significant communication delays because after pushing their updates, computing nodes (workers) have to wait for the global model to be communicated back from the master in every iteration. In this paper, we devise a new synchronization parallel mechanism named overlap synchronization parallel (OSP), in which the waiting time is removed by conducting computation and communication in an overlapped manner. We theoretically prove that our mechanism could achieve the same convergence rate compared to the sequential SGD for non-convex problems. Evaluations show that our mechanism significantly improves performance over the state-of-the-art ones, e.g., by 4× for both AlexNet and ResNet18 in terms of convergence speed.

查看原文本刊更多论文

面向快速机器学习的参数服务器的重叠计算与通信

当在参数服务器(PS)中运行时，分布式随机梯度下降(SGD)会导致严重的通信延迟，因为在推送更新之后，计算节点(工作节点)必须等待全局模型在每次迭代中从主服务器通信回来。本文设计了一种新的同步并行机制，即重叠同步并行(OSP)，该机制通过以重叠的方式进行计算和通信来消除等待时间。从理论上证明，对于非凸问题，我们的机制可以达到与序列SGD相同的收敛速度。评估表明，我们的机制比最先进的机制显著提高了性能，例如，在收敛速度方面，AlexNet和ResNet18都提高了4倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 48th International Conference on Parallel Processing

自引率

0.00%

发文量