并行随机梯度下降中的事件触发通信

IF 25.4 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Foundations and Trends in Machine Learning Pub Date : 2020-11-01 DOI:10.1109/MLHPCAI4S51975.2020.00008

Soumyadip Ghosh, V. Gupta

{"title":"并行随机梯度下降中的事件触发通信","authors":"Soumyadip Ghosh, V. Gupta","doi":"10.1109/MLHPCAI4S51975.2020.00008","DOIUrl":null,"url":null,"abstract":"Communication in parallel systems consumes significant amount of time and energy which often turns out to be a bottleneck in distributed machine learning. In this paper, we present EventGraD - an algorithm with event-triggered communication in parallel stochastic gradient descent. The main idea of this algorithm is to modify the requirement of communication at every epoch to communicating only in certain epochs when necessary. In particular, the parameters are communicated only in the event when the change in their values exceed a threshold. The threshold for a parameter is chosen adaptively based on the rate of change of the parameter. The adaptive threshold ensures that the algorithm can be applied to different models on different datasets without any change. We focus on data-parallel training of a popular convolutional neural network used for training the MNIST dataset and show that EventGraD can reduce the communication load by up to 70% while retaining the same level of accuracy.","PeriodicalId":47667,"journal":{"name":"Foundations and Trends in Machine Learning","volume":"41 1","pages":"1-8"},"PeriodicalIF":25.4000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"EventGraD: Event-Triggered Communication in Parallel Stochastic Gradient Descent\",\"authors\":\"Soumyadip Ghosh, V. Gupta\",\"doi\":\"10.1109/MLHPCAI4S51975.2020.00008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Communication in parallel systems consumes significant amount of time and energy which often turns out to be a bottleneck in distributed machine learning. In this paper, we present EventGraD - an algorithm with event-triggered communication in parallel stochastic gradient descent. The main idea of this algorithm is to modify the requirement of communication at every epoch to communicating only in certain epochs when necessary. In particular, the parameters are communicated only in the event when the change in their values exceed a threshold. The threshold for a parameter is chosen adaptively based on the rate of change of the parameter. The adaptive threshold ensures that the algorithm can be applied to different models on different datasets without any change. We focus on data-parallel training of a popular convolutional neural network used for training the MNIST dataset and show that EventGraD can reduce the communication load by up to 70% while retaining the same level of accuracy.\",\"PeriodicalId\":47667,\"journal\":{\"name\":\"Foundations and Trends in Machine Learning\",\"volume\":\"41 1\",\"pages\":\"1-8\"},\"PeriodicalIF\":25.4000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Foundations and Trends in Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MLHPCAI4S51975.2020.00008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Foundations and Trends in Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MLHPCAI4S51975.2020.00008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 3

摘要

并行系统中的通信消耗了大量的时间和能量，这往往成为分布式机器学习的瓶颈。本文提出了一种并行随机梯度下降的事件触发通信算法EventGraD。该算法的主要思想是将每个历元的通信要求修改为在必要时只在某些历元进行通信。特别是，参数只有在其值的变化超过阈值时才进行通信。根据参数的变化率自适应地选择参数的阈值。自适应阈值保证了算法可以在不改变任何数据集的情况下应用于不同的模型。我们专注于用于训练MNIST数据集的流行卷积神经网络的数据并行训练，并表明EventGraD可以在保持相同精度的同时减少高达70%的通信负载。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

EventGraD: Event-Triggered Communication in Parallel Stochastic Gradient Descent

Communication in parallel systems consumes significant amount of time and energy which often turns out to be a bottleneck in distributed machine learning. In this paper, we present EventGraD - an algorithm with event-triggered communication in parallel stochastic gradient descent. The main idea of this algorithm is to modify the requirement of communication at every epoch to communicating only in certain epochs when necessary. In particular, the parameters are communicated only in the event when the change in their values exceed a threshold. The threshold for a parameter is chosen adaptively based on the rate of change of the parameter. The adaptive threshold ensures that the algorithm can be applied to different models on different datasets without any change. We focus on data-parallel training of a popular convolutional neural network used for training the MNIST dataset and show that EventGraD can reduce the communication load by up to 70% while retaining the same level of accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Foundations and Trends in Machine Learning COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

108.50

自引率

0.00%

发文量

期刊介绍： Each issue of Foundations and Trends® in Machine Learning comprises a monograph of at least 50 pages written by research leaders in the field. We aim to publish monographs that provide an in-depth, self-contained treatment of topics where there have been significant new developments. Typically, this means that the monographs we publish will contain a significant level of mathematical detail (to describe the central methods and/or theory for the topic at hand), and will not eschew these details by simply pointing to existing references. Literature surveys and original research papers do not fall within these aims.