Iterative parallel data processing with stratosphere: an inside look

Stephan Ewen, Sebastian Schelter, K. Tzoumas, Daniel Warneke, V. Markl
{"title":"Iterative parallel data processing with stratosphere: an inside look","authors":"Stephan Ewen, Sebastian Schelter, K. Tzoumas, Daniel Warneke, V. Markl","doi":"10.1145/2463676.2463693","DOIUrl":null,"url":null,"abstract":"Iterative algorithms occur in many domains of data analysis, such as machine learning or graph analysis. With increasing interest to run those algorithms on very large data sets, we see a need for new techniques to execute iterations in a massively parallel fashion. In prior work, we have shown how to extend and use a parallel data flow system to efficiently run iterative algorithms in a shared-nothing environment. Our approach supports the incremental processing nature of many of those algorithms.\n In this demonstration proposal we illustrate the process of implementing, compiling, optimizing, and executing iterative algorithms on Stratosphere using examples from graph analysis and machine learning. For the first step, we show the algorithm's code and a visualization of the produced data flow programs. The second step shows the optimizer's execution plan choices, while the last phase monitors the execution of the program, visualizing the state of the operators and additional metrics, such as per-iteration runtime and number of updates.\n To show that the data flow abstraction supports easy creation of custom programming APIs, we also present programs written against a lightweight Pregel API that is layered on top of our system with a small programming effort.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. ACM-SIGMOD International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2463676.2463693","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 37

Abstract

Iterative algorithms occur in many domains of data analysis, such as machine learning or graph analysis. With increasing interest to run those algorithms on very large data sets, we see a need for new techniques to execute iterations in a massively parallel fashion. In prior work, we have shown how to extend and use a parallel data flow system to efficiently run iterative algorithms in a shared-nothing environment. Our approach supports the incremental processing nature of many of those algorithms. In this demonstration proposal we illustrate the process of implementing, compiling, optimizing, and executing iterative algorithms on Stratosphere using examples from graph analysis and machine learning. For the first step, we show the algorithm's code and a visualization of the produced data flow programs. The second step shows the optimizer's execution plan choices, while the last phase monitors the execution of the program, visualizing the state of the operators and additional metrics, such as per-iteration runtime and number of updates. To show that the data flow abstraction supports easy creation of custom programming APIs, we also present programs written against a lightweight Pregel API that is layered on top of our system with a small programming effort.
基于平流层的迭代并行数据处理:内部观察
迭代算法出现在数据分析的许多领域,如机器学习或图分析。随着人们对在非常大的数据集上运行这些算法越来越感兴趣,我们看到需要新的技术来以大规模并行的方式执行迭代。在之前的工作中,我们已经展示了如何扩展和使用并行数据流系统,以便在无共享环境中有效地运行迭代算法。我们的方法支持许多这些算法的增量处理特性。在本演示提案中,我们使用图分析和机器学习的示例说明了在Stratosphere上实现,编译,优化和执行迭代算法的过程。对于第一步,我们展示了算法的代码和生成的数据流程序的可视化。第二步显示优化器的执行计划选择,而最后一个阶段监视程序的执行,可视化操作符的状态和附加指标,例如每次迭代运行时和更新次数。为了显示数据流抽象支持轻松创建自定义编程API,我们还提供了针对轻量级Pregel API编写的程序,该API通过少量编程工作分层在我们的系统之上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信