Tom J. Ameloot, Gaetano Geck, Bas Ketsman, F. Neven, T. Schwentick
{"title":"Data partitioning for single-round multi-join evaluation in massively parallel systems","authors":"Tom J. Ameloot, Gaetano Geck, Bas Ketsman, F. Neven, T. Schwentick","doi":"10.1145/2949741.2949750","DOIUrl":null,"url":null,"abstract":"A dominant cost for query evaluation in modern massively distributed systems is the number of communication rounds. For this reason, there is a growing interest in single-round multiway join algorithms where data is first reshuffled over many servers and then evaluated in a parallel but communication- free way. The reshuffling itself is specified as a distribution policy. We introduce a correctness condition, called parallel-correctness, for the evaluation of queries w.r.t. a distribution policy. We provide a semantical characterization for when conjunctive queries (and extensions thereof) are parallel-correct and give matching complexity bounds for the associated decision problem.\n Motivated by scenarios for workload optimization, we further consider the problem of parallel-correctness transfer from a query Q to a query Q0, that is, whether Q0 is parallelcorrect for all distribution policies for which Q is parallelcorrect. In this case, Q0 can always be evaluated after Q without repartitioning the data. We provide a semantical characterization for parallel-correctness transfer and provide matching complexity bounds for the associated decision problem for conjunctive queries (and extensions). Finally, we investigate restrictions of queries and families of distribution policies with better complexities, including, for instance, the Hypercube distributions.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"23 1","pages":"33-40"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIGMOD Rec.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2949741.2949750","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
A dominant cost for query evaluation in modern massively distributed systems is the number of communication rounds. For this reason, there is a growing interest in single-round multiway join algorithms where data is first reshuffled over many servers and then evaluated in a parallel but communication- free way. The reshuffling itself is specified as a distribution policy. We introduce a correctness condition, called parallel-correctness, for the evaluation of queries w.r.t. a distribution policy. We provide a semantical characterization for when conjunctive queries (and extensions thereof) are parallel-correct and give matching complexity bounds for the associated decision problem.
Motivated by scenarios for workload optimization, we further consider the problem of parallel-correctness transfer from a query Q to a query Q0, that is, whether Q0 is parallelcorrect for all distribution policies for which Q is parallelcorrect. In this case, Q0 can always be evaluated after Q without repartitioning the data. We provide a semantical characterization for parallel-correctness transfer and provide matching complexity bounds for the associated decision problem for conjunctive queries (and extensions). Finally, we investigate restrictions of queries and families of distribution policies with better complexities, including, for instance, the Hypercube distributions.