STRADS: a distributed framework for scheduled model parallel machine learning

Proceedings of the Eleventh European Conference on Computer Systems Pub Date : 2016-04-18 DOI:10.1145/2901318.2901331

Jin Kyu Kim, Qirong Ho, Seunghak Lee, Xun Zheng, Wei Dai, Garth A. Gibson, E. Xing

{"title":"STRADS: a distributed framework for scheduled model parallel machine learning","authors":"Jin Kyu Kim, Qirong Ho, Seunghak Lee, Xun Zheng, Wei Dai, Garth A. Gibson, E. Xing","doi":"10.1145/2901318.2901331","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) algorithms are commonly applied to big data, using distributed systems that partition the data across machines and allow each machine to read and update all ML model parameters --- a strategy known as data parallelism. An alternative and complimentary strategy, model parallelism, partitions the model parameters for non-shared parallel access and updates, and may periodically repartition the parameters to facilitate communication. Model parallelism is motivated by two challenges that data-parallelism does not usually address: (1) parameters may be dependent, thus naive concurrent updates can introduce errors that slow convergence or even cause algorithm failure; (2) model parameters converge at different rates, thus a small subset of parameters can bottleneck ML algorithm completion. We propose scheduled model parallelism (SchMP), a programming approach that improves ML algorithm convergence speed by efficiently scheduling parameter updates, taking into account parameter dependencies and uneven convergence. To support SchMP at scale, we develop a distributed framework STRADS which optimizes the throughput of SchMP programs, and benchmark four common ML applications written as SchMP programs: LDA topic modeling, matrix factorization, sparse least-squares (Lasso) regression and sparse logistic regression. By improving ML progress per iteration through SchMP programming whilst improving iteration throughput through STRADS we show that SchMP programs running on STRADS outperform non-model-parallel ML implementations: for example, SchMP LDA and SchMP Lasso respectively achieve 10x and 5x faster convergence than recent, well-established baselines.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"80","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Eleventh European Conference on Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2901318.2901331","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 80

Abstract

Machine learning (ML) algorithms are commonly applied to big data, using distributed systems that partition the data across machines and allow each machine to read and update all ML model parameters --- a strategy known as data parallelism. An alternative and complimentary strategy, model parallelism, partitions the model parameters for non-shared parallel access and updates, and may periodically repartition the parameters to facilitate communication. Model parallelism is motivated by two challenges that data-parallelism does not usually address: (1) parameters may be dependent, thus naive concurrent updates can introduce errors that slow convergence or even cause algorithm failure; (2) model parameters converge at different rates, thus a small subset of parameters can bottleneck ML algorithm completion. We propose scheduled model parallelism (SchMP), a programming approach that improves ML algorithm convergence speed by efficiently scheduling parameter updates, taking into account parameter dependencies and uneven convergence. To support SchMP at scale, we develop a distributed framework STRADS which optimizes the throughput of SchMP programs, and benchmark four common ML applications written as SchMP programs: LDA topic modeling, matrix factorization, sparse least-squares (Lasso) regression and sparse logistic regression. By improving ML progress per iteration through SchMP programming whilst improving iteration throughput through STRADS we show that SchMP programs running on STRADS outperform non-model-parallel ML implementations: for example, SchMP LDA and SchMP Lasso respectively achieve 10x and 5x faster convergence than recent, well-established baselines.

查看原文本刊更多论文

STRADS:调度模型并行机器学习的分布式框架

机器学习(ML)算法通常应用于大数据，使用分布式系统将数据跨机器分区，并允许每台机器读取和更新所有ML模型参数——一种称为数据并行的策略。另一种补充策略是模型并行性，它将模型参数划分为非共享并行访问和更新，并可以定期重新划分参数以促进通信。模型并行是由数据并行通常无法解决的两个挑战驱动的:(1)参数可能是相互依赖的，因此幼稚的并发更新可能引入错误，减慢收敛速度甚至导致算法失败;(2)模型参数收敛速度不同，一小部分参数可能成为ML算法完成的瓶颈。我们提出了调度模型并行(SchMP)，这是一种编程方法，通过有效地调度参数更新来提高ML算法的收敛速度，同时考虑了参数依赖性和不均匀收敛性。为了大规模支持SchMP，我们开发了一个分布式框架STRADS来优化SchMP程序的吞吐量，并对四种常见的ML应用程序进行了基准测试:LDA主题建模、矩阵分解、稀疏最小二乘(Lasso)回归和稀疏逻辑回归。通过SchMP编程提高每次迭代的机器学习进度，同时通过STRADS提高迭代吞吐量，我们发现在STRADS上运行的SchMP程序优于非模型并行的机器学习实现:例如，SchMP LDA和SchMP Lasso分别比最近建立的基线快10倍和5倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Eleventh European Conference on Computer Systems

自引率

0.00%

发文量