火枪手:在数据处理系统中,人人为我,我为人人

Ionel Gog, Malte Schwarzkopf, Natacha Crooks, Matthew P. Grosvenor, Allen Clement, S. Hand
{"title":"火枪手:在数据处理系统中,人人为我,我为人人","authors":"Ionel Gog, Malte Schwarzkopf, Natacha Crooks, Matthew P. Grosvenor, Allen Clement, S. Hand","doi":"10.1145/2741948.2741968","DOIUrl":null,"url":null,"abstract":"Many systems for the parallel processing of big data are available today. Yet, few users can tell by intuition which system, or combination of systems, is \"best\" for a given workflow. Porting workflows between systems is tedious. Hence, users become \"locked in\", despite faster or more efficient systems being available. This is a direct consequence of the tight coupling between user-facing front-ends that express workflows (e.g., Hive, SparkSQL, Lindi, GraphLINQ) and the back-end execution engines that run them (e.g., MapReduce, Spark, PowerGraph, Naiad). We argue that the ways that workflows are defined should be decoupled from the manner in which they are executed. To explore this idea, we have built Musketeer, a workflow manager which can dynamically map front-end workflow descriptions to a broad range of back-end execution engines. Our prototype maps workflows expressed in four high-level query languages to seven different popular data processing systems. Musketeer speeds up realistic workflows by up to 9x by targeting different execution engines, without requiring any manual effort. Its automatically generated back-end code comes within 5%--30% of the performance of hand-optimized implementations.","PeriodicalId":119291,"journal":{"name":"Proceedings of the Tenth European Conference on Computer Systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"108","resultStr":"{\"title\":\"Musketeer: all for one, one for all in data processing systems\",\"authors\":\"Ionel Gog, Malte Schwarzkopf, Natacha Crooks, Matthew P. Grosvenor, Allen Clement, S. Hand\",\"doi\":\"10.1145/2741948.2741968\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many systems for the parallel processing of big data are available today. Yet, few users can tell by intuition which system, or combination of systems, is \\\"best\\\" for a given workflow. Porting workflows between systems is tedious. Hence, users become \\\"locked in\\\", despite faster or more efficient systems being available. This is a direct consequence of the tight coupling between user-facing front-ends that express workflows (e.g., Hive, SparkSQL, Lindi, GraphLINQ) and the back-end execution engines that run them (e.g., MapReduce, Spark, PowerGraph, Naiad). We argue that the ways that workflows are defined should be decoupled from the manner in which they are executed. To explore this idea, we have built Musketeer, a workflow manager which can dynamically map front-end workflow descriptions to a broad range of back-end execution engines. Our prototype maps workflows expressed in four high-level query languages to seven different popular data processing systems. Musketeer speeds up realistic workflows by up to 9x by targeting different execution engines, without requiring any manual effort. Its automatically generated back-end code comes within 5%--30% of the performance of hand-optimized implementations.\",\"PeriodicalId\":119291,\"journal\":{\"name\":\"Proceedings of the Tenth European Conference on Computer Systems\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"108\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Tenth European Conference on Computer Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2741948.2741968\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Tenth European Conference on Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2741948.2741968","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 108

摘要

现在有很多并行处理大数据的系统。然而,很少有用户能够凭直觉判断哪个系统或系统组合对于给定的工作流是“最好的”。在系统之间移植工作流是乏味的。因此,尽管有更快或更高效的系统可用,用户还是被“锁定”了。这是表达工作流的面向用户的前端(例如Hive, SparkSQL, Lindi, GraphLINQ)和运行它们的后端执行引擎(例如MapReduce, Spark, PowerGraph, Naiad)之间紧密耦合的直接结果。我们认为工作流定义的方式应该与工作流执行的方式分离。为了探索这个想法,我们构建了一个工作流管理器Musketeer,它可以动态地将前端工作流描述映射到广泛的后端执行引擎。我们的原型将用四种高级查询语言表示的工作流映射到七种不同的流行数据处理系统。通过针对不同的执行引擎,Musketeer将实际工作流程的速度提高了9倍,而无需任何手动工作。它自动生成的后端代码的性能只有手工优化实现的5%- 30%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Musketeer: all for one, one for all in data processing systems
Many systems for the parallel processing of big data are available today. Yet, few users can tell by intuition which system, or combination of systems, is "best" for a given workflow. Porting workflows between systems is tedious. Hence, users become "locked in", despite faster or more efficient systems being available. This is a direct consequence of the tight coupling between user-facing front-ends that express workflows (e.g., Hive, SparkSQL, Lindi, GraphLINQ) and the back-end execution engines that run them (e.g., MapReduce, Spark, PowerGraph, Naiad). We argue that the ways that workflows are defined should be decoupled from the manner in which they are executed. To explore this idea, we have built Musketeer, a workflow manager which can dynamically map front-end workflow descriptions to a broad range of back-end execution engines. Our prototype maps workflows expressed in four high-level query languages to seven different popular data processing systems. Musketeer speeds up realistic workflows by up to 9x by targeting different execution engines, without requiring any manual effort. Its automatically generated back-end code comes within 5%--30% of the performance of hand-optimized implementations.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信