A Brief Tutorial on Distributed and Concurrent Machine Learning

Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing Pub Date : 2018-07-23 DOI:10.1145/3212734.3212798

Dan Alistarh

{"title":"A Brief Tutorial on Distributed and Concurrent Machine Learning","authors":"Dan Alistarh","doi":"10.1145/3212734.3212798","DOIUrl":null,"url":null,"abstract":"The area of machine learning has made considerable progress over the past decade, enabled by the widespread availability of large datasets, as well as by improved algorithms and models. Given the large computational demands of machine learning workloads, parallelism, implemented either through single-node concurrency or through multi-node distribution, has been a third key ingredient to advances in machine learning. The goal of this tutorial is to provide the audience with an overview of standard distribution techniques in machine learning, with an eye towards the intriguing trade-offs between synchronization and communication costs of distributed machine learning algorithms, on the one hand, and their convergence, on the other.The tutorial will focus on parallelization strategies for the fundamental stochastic gradient descent (SGD) algorithm, which is a key tool when training machine learning models, from classical instances such as linear regression, to state-of-the-art neural network architectures. The tutorial will describe the guarantees provided by this algorithm in the sequential case, and then move on to cover both shared-memory and message-passing parallelization strategies, together with the guarantees they provide, and corresponding trade-offs. The presentation will conclude with a broad overview of ongoing research in distributed and concurrent machine learning. The tutorial will assume no prior knowledge beyond familiarity with basic concepts in algebra and analysis.","PeriodicalId":198284,"journal":{"name":"Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3212734.3212798","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

The area of machine learning has made considerable progress over the past decade, enabled by the widespread availability of large datasets, as well as by improved algorithms and models. Given the large computational demands of machine learning workloads, parallelism, implemented either through single-node concurrency or through multi-node distribution, has been a third key ingredient to advances in machine learning. The goal of this tutorial is to provide the audience with an overview of standard distribution techniques in machine learning, with an eye towards the intriguing trade-offs between synchronization and communication costs of distributed machine learning algorithms, on the one hand, and their convergence, on the other.The tutorial will focus on parallelization strategies for the fundamental stochastic gradient descent (SGD) algorithm, which is a key tool when training machine learning models, from classical instances such as linear regression, to state-of-the-art neural network architectures. The tutorial will describe the guarantees provided by this algorithm in the sequential case, and then move on to cover both shared-memory and message-passing parallelization strategies, together with the guarantees they provide, and corresponding trade-offs. The presentation will conclude with a broad overview of ongoing research in distributed and concurrent machine learning. The tutorial will assume no prior knowledge beyond familiarity with basic concepts in algebra and analysis.

查看原文本刊更多论文

分布式和并发机器学习简介

机器学习领域在过去十年中取得了相当大的进步，这得益于大型数据集的广泛可用性，以及算法和模型的改进。考虑到机器学习工作负载的大量计算需求，通过单节点并发或多节点分布实现的并行性已经成为机器学习进步的第三个关键因素。本教程的目标是向读者提供机器学习中标准分布技术的概述，同时着眼于分布式机器学习算法的同步和通信成本之间的有趣权衡，以及它们的收敛性。本教程将重点介绍基本随机梯度下降(SGD)算法的并行化策略，这是训练机器学习模型的关键工具，从经典实例(如线性回归)到最先进的神经网络架构。本教程将描述该算法在顺序情况下提供的保证，然后继续介绍共享内存和消息传递并行化策略，以及它们提供的保证和相应的权衡。报告将以对分布式和并发机器学习的研究进行概述来结束。本教程将假定除了熟悉代数和分析的基本概念外，没有任何先验知识。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing

自引率

0.00%

发文量