Tom J. Ameloot, Gaetano Geck, Bas Ketsman, F. Neven, T. Schwentick
{"title":"大规模并行系统中单轮多连接评估的数据分区","authors":"Tom J. Ameloot, Gaetano Geck, Bas Ketsman, F. Neven, T. Schwentick","doi":"10.1145/2949741.2949750","DOIUrl":null,"url":null,"abstract":"A dominant cost for query evaluation in modern massively distributed systems is the number of communication rounds. For this reason, there is a growing interest in single-round multiway join algorithms where data is first reshuffled over many servers and then evaluated in a parallel but communication- free way. The reshuffling itself is specified as a distribution policy. We introduce a correctness condition, called parallel-correctness, for the evaluation of queries w.r.t. a distribution policy. We provide a semantical characterization for when conjunctive queries (and extensions thereof) are parallel-correct and give matching complexity bounds for the associated decision problem.\n Motivated by scenarios for workload optimization, we further consider the problem of parallel-correctness transfer from a query Q to a query Q0, that is, whether Q0 is parallelcorrect for all distribution policies for which Q is parallelcorrect. In this case, Q0 can always be evaluated after Q without repartitioning the data. We provide a semantical characterization for parallel-correctness transfer and provide matching complexity bounds for the associated decision problem for conjunctive queries (and extensions). Finally, we investigate restrictions of queries and families of distribution policies with better complexities, including, for instance, the Hypercube distributions.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"23 1","pages":"33-40"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Data partitioning for single-round multi-join evaluation in massively parallel systems\",\"authors\":\"Tom J. Ameloot, Gaetano Geck, Bas Ketsman, F. Neven, T. Schwentick\",\"doi\":\"10.1145/2949741.2949750\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A dominant cost for query evaluation in modern massively distributed systems is the number of communication rounds. For this reason, there is a growing interest in single-round multiway join algorithms where data is first reshuffled over many servers and then evaluated in a parallel but communication- free way. The reshuffling itself is specified as a distribution policy. We introduce a correctness condition, called parallel-correctness, for the evaluation of queries w.r.t. a distribution policy. We provide a semantical characterization for when conjunctive queries (and extensions thereof) are parallel-correct and give matching complexity bounds for the associated decision problem.\\n Motivated by scenarios for workload optimization, we further consider the problem of parallel-correctness transfer from a query Q to a query Q0, that is, whether Q0 is parallelcorrect for all distribution policies for which Q is parallelcorrect. In this case, Q0 can always be evaluated after Q without repartitioning the data. We provide a semantical characterization for parallel-correctness transfer and provide matching complexity bounds for the associated decision problem for conjunctive queries (and extensions). Finally, we investigate restrictions of queries and families of distribution policies with better complexities, including, for instance, the Hypercube distributions.\",\"PeriodicalId\":21740,\"journal\":{\"name\":\"SIGMOD Rec.\",\"volume\":\"23 1\",\"pages\":\"33-40\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SIGMOD Rec.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2949741.2949750\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIGMOD Rec.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2949741.2949750","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data partitioning for single-round multi-join evaluation in massively parallel systems
A dominant cost for query evaluation in modern massively distributed systems is the number of communication rounds. For this reason, there is a growing interest in single-round multiway join algorithms where data is first reshuffled over many servers and then evaluated in a parallel but communication- free way. The reshuffling itself is specified as a distribution policy. We introduce a correctness condition, called parallel-correctness, for the evaluation of queries w.r.t. a distribution policy. We provide a semantical characterization for when conjunctive queries (and extensions thereof) are parallel-correct and give matching complexity bounds for the associated decision problem.
Motivated by scenarios for workload optimization, we further consider the problem of parallel-correctness transfer from a query Q to a query Q0, that is, whether Q0 is parallelcorrect for all distribution policies for which Q is parallelcorrect. In this case, Q0 can always be evaluated after Q without repartitioning the data. We provide a semantical characterization for parallel-correctness transfer and provide matching complexity bounds for the associated decision problem for conjunctive queries (and extensions). Finally, we investigate restrictions of queries and families of distribution policies with better complexities, including, for instance, the Hypercube distributions.