Data partitioning for single-round multi-join evaluation in massively parallel systems

Tom J. Ameloot, Gaetano Geck, Bas Ketsman, F. Neven, T. Schwentick
{"title":"Data partitioning for single-round multi-join evaluation in massively parallel systems","authors":"Tom J. Ameloot, Gaetano Geck, Bas Ketsman, F. Neven, T. Schwentick","doi":"10.1145/2949741.2949750","DOIUrl":null,"url":null,"abstract":"A dominant cost for query evaluation in modern massively distributed systems is the number of communication rounds. For this reason, there is a growing interest in single-round multiway join algorithms where data is first reshuffled over many servers and then evaluated in a parallel but communication- free way. The reshuffling itself is specified as a distribution policy. We introduce a correctness condition, called parallel-correctness, for the evaluation of queries w.r.t. a distribution policy. We provide a semantical characterization for when conjunctive queries (and extensions thereof) are parallel-correct and give matching complexity bounds for the associated decision problem.\n Motivated by scenarios for workload optimization, we further consider the problem of parallel-correctness transfer from a query Q to a query Q0, that is, whether Q0 is parallelcorrect for all distribution policies for which Q is parallelcorrect. In this case, Q0 can always be evaluated after Q without repartitioning the data. We provide a semantical characterization for parallel-correctness transfer and provide matching complexity bounds for the associated decision problem for conjunctive queries (and extensions). Finally, we investigate restrictions of queries and families of distribution policies with better complexities, including, for instance, the Hypercube distributions.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"23 1","pages":"33-40"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIGMOD Rec.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2949741.2949750","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

A dominant cost for query evaluation in modern massively distributed systems is the number of communication rounds. For this reason, there is a growing interest in single-round multiway join algorithms where data is first reshuffled over many servers and then evaluated in a parallel but communication- free way. The reshuffling itself is specified as a distribution policy. We introduce a correctness condition, called parallel-correctness, for the evaluation of queries w.r.t. a distribution policy. We provide a semantical characterization for when conjunctive queries (and extensions thereof) are parallel-correct and give matching complexity bounds for the associated decision problem. Motivated by scenarios for workload optimization, we further consider the problem of parallel-correctness transfer from a query Q to a query Q0, that is, whether Q0 is parallelcorrect for all distribution policies for which Q is parallelcorrect. In this case, Q0 can always be evaluated after Q without repartitioning the data. We provide a semantical characterization for parallel-correctness transfer and provide matching complexity bounds for the associated decision problem for conjunctive queries (and extensions). Finally, we investigate restrictions of queries and families of distribution policies with better complexities, including, for instance, the Hypercube distributions.
大规模并行系统中单轮多连接评估的数据分区
在现代大规模分布式系统中,查询评估的主要成本是通信轮数。由于这个原因,人们对单轮多路连接算法越来越感兴趣,这种算法首先在许多服务器上重新排列数据,然后以并行但不需要通信的方式进行评估。重新洗牌本身被指定为分发策略。我们引入了一种正确性条件,称为并行正确性,用于计算w.r.t.分布策略中的查询。我们提供了一个语义表征,说明什么时候联合查询(及其扩展)是并行正确的,并给出了相关决策问题的匹配复杂性界限。在工作负载优化场景的激励下,我们进一步考虑从查询Q到查询Q0的并行正确性转移问题,即对于Q为并行正确性的所有分布策略,Q0是否为并行正确性。在这种情况下,Q0总是可以在Q之后求值,而无需重新划分数据。我们为并行正确性转移提供了语义表征,并为联合查询(和扩展)的相关决策问题提供了匹配的复杂性界限。最后,我们研究了查询的限制和具有更好复杂性的分布策略族,例如包括Hypercube分布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信