集群上迭代ML作业的主动同步与部分处理

Proceedings of the 19th International Middleware Conference Pub Date : 2018-11-26 DOI:10.1145/3274808.3274828

Shaoqi Wang, Wei Chen, Aidi Pi, Xiaobo Zhou

{"title":"集群上迭代ML作业的主动同步与部分处理","authors":"Shaoqi Wang, Wei Chen, Aidi Pi, Xiaobo Zhou","doi":"10.1145/3274808.3274828","DOIUrl":null,"url":null,"abstract":"Executing distributed machine learning (ML) jobs on Spark follows Bulk Synchronous Parallel (BSP) model, where parallel tasks execute the same iteration at the same time and the generated updates must be synchronized on parameters when all tasks are finished. However, the parallel tasks rarely have the same execution time due to sparse data so that the synchronization has to wait for tasks finished late. Moreover, running Spark on heterogeneous clusters makes it even worse because of stragglers, where the synchronization is significantly delayed by the slowest task. This paper attacks the fundamental BSP model that supports iterative ML jobs. We propose and develop a novel BSP-based Aggressive synchronization (A-BSP) model based on the convergent property of iterative ML algorithms, by allowing the algorithm to use the updates generated based on partial input data for synchronization. Specifically, when the fastest task completes, A-BSP fetches the current updates generated by the rest tasks that have partially processed their input data to push for aggressive synchronization. Furthermore, unprocessed data is prioritized for processing in the subsequent iterations to ensure algorithm convergence rate. Theoretically, we prove the algorithm convergence for gradient descent under A-BSP model. We have implemented A-BSP as a light-weight BSP-compatible mechanism in Spark and performed evaluations with various ML jobs. Experimental results show that compared to BSP, A-BSP speeds up the execution by up to 2.36x. We have also extended A-BSP onto Petuum platform and compared to the Stale Synchronous Parallel (SSP) and Asynchronous Synchronous Parallel (ASP) models. A-BSP performs better than SSP and ASP for gradient descent based jobs. It also outperforms SSP for jobs on physical heterogeneous clusters.","PeriodicalId":167957,"journal":{"name":"Proceedings of the 19th International Middleware Conference","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Aggressive Synchronization with Partial Processing for Iterative ML Jobs on Clusters\",\"authors\":\"Shaoqi Wang, Wei Chen, Aidi Pi, Xiaobo Zhou\",\"doi\":\"10.1145/3274808.3274828\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Executing distributed machine learning (ML) jobs on Spark follows Bulk Synchronous Parallel (BSP) model, where parallel tasks execute the same iteration at the same time and the generated updates must be synchronized on parameters when all tasks are finished. However, the parallel tasks rarely have the same execution time due to sparse data so that the synchronization has to wait for tasks finished late. Moreover, running Spark on heterogeneous clusters makes it even worse because of stragglers, where the synchronization is significantly delayed by the slowest task. This paper attacks the fundamental BSP model that supports iterative ML jobs. We propose and develop a novel BSP-based Aggressive synchronization (A-BSP) model based on the convergent property of iterative ML algorithms, by allowing the algorithm to use the updates generated based on partial input data for synchronization. Specifically, when the fastest task completes, A-BSP fetches the current updates generated by the rest tasks that have partially processed their input data to push for aggressive synchronization. Furthermore, unprocessed data is prioritized for processing in the subsequent iterations to ensure algorithm convergence rate. Theoretically, we prove the algorithm convergence for gradient descent under A-BSP model. We have implemented A-BSP as a light-weight BSP-compatible mechanism in Spark and performed evaluations with various ML jobs. Experimental results show that compared to BSP, A-BSP speeds up the execution by up to 2.36x. We have also extended A-BSP onto Petuum platform and compared to the Stale Synchronous Parallel (SSP) and Asynchronous Synchronous Parallel (ASP) models. A-BSP performs better than SSP and ASP for gradient descent based jobs. It also outperforms SSP for jobs on physical heterogeneous clusters.\",\"PeriodicalId\":167957,\"journal\":{\"name\":\"Proceedings of the 19th International Middleware Conference\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 19th International Middleware Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3274808.3274828\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th International Middleware Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3274808.3274828","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

在Spark上执行分布式机器学习(ML)作业遵循批量同步并行(BSP)模型，其中并行任务同时执行相同的迭代，并且在所有任务完成时必须同步生成的更新参数。然而，由于数据稀疏，并行任务很少有相同的执行时间，因此同步必须等待较晚完成的任务。此外，在异构集群上运行Spark会使情况变得更糟，因为存在散点，其中最慢的任务会显著延迟同步。本文攻击支持迭代机器学习作业的基本BSP模型。我们提出并开发了一种基于迭代ML算法收敛性的基于bsp的主动同步(a - bsp)模型，允许算法使用基于部分输入数据生成的更新进行同步。具体地说，当最快的任务完成时，A-BSP获取由部分处理其输入数据的其他任务生成的当前更新，以推动主动同步。此外，未处理的数据在后续迭代中优先处理，以保证算法的收敛速度。从理论上证明了A-BSP模型下梯度下降算法的收敛性。我们已经在Spark中实现了a - bsp作为轻量级的bsp兼容机制，并对各种ML作业进行了评估。实验结果表明，与BSP相比，A-BSP的执行速度提高了2.36倍。我们还将A-BSP扩展到Petuum平台，并与过时同步并行(SSP)和异步同步并行(ASP)模型进行了比较。对于基于梯度下降的作业，A-BSP的性能优于SSP和ASP。对于物理异构集群上的作业，它的性能也优于SSP。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Aggressive Synchronization with Partial Processing for Iterative ML Jobs on Clusters

Executing distributed machine learning (ML) jobs on Spark follows Bulk Synchronous Parallel (BSP) model, where parallel tasks execute the same iteration at the same time and the generated updates must be synchronized on parameters when all tasks are finished. However, the parallel tasks rarely have the same execution time due to sparse data so that the synchronization has to wait for tasks finished late. Moreover, running Spark on heterogeneous clusters makes it even worse because of stragglers, where the synchronization is significantly delayed by the slowest task. This paper attacks the fundamental BSP model that supports iterative ML jobs. We propose and develop a novel BSP-based Aggressive synchronization (A-BSP) model based on the convergent property of iterative ML algorithms, by allowing the algorithm to use the updates generated based on partial input data for synchronization. Specifically, when the fastest task completes, A-BSP fetches the current updates generated by the rest tasks that have partially processed their input data to push for aggressive synchronization. Furthermore, unprocessed data is prioritized for processing in the subsequent iterations to ensure algorithm convergence rate. Theoretically, we prove the algorithm convergence for gradient descent under A-BSP model. We have implemented A-BSP as a light-weight BSP-compatible mechanism in Spark and performed evaluations with various ML jobs. Experimental results show that compared to BSP, A-BSP speeds up the execution by up to 2.36x. We have also extended A-BSP onto Petuum platform and compared to the Stale Synchronous Parallel (SSP) and Asynchronous Synchronous Parallel (ASP) models. A-BSP performs better than SSP and ASP for gradient descent based jobs. It also outperforms SSP for jobs on physical heterogeneous clusters.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 19th International Middleware Conference

自引率

0.00%

发文量