DunceCap: Query Plans Using Generalized Hypertree Decompositions

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data Pub Date : 2015-05-27 DOI:10.1145/2723372.2764946

Susan Tu, C. Ré

{"title":"DunceCap: Query Plans Using Generalized Hypertree Decompositions","authors":"Susan Tu, C. Ré","doi":"10.1145/2723372.2764946","DOIUrl":null,"url":null,"abstract":"Joins are central to data processing. However, traditional query plans for joins, which are based on choosing the order of pairwise joins, are provably suboptimal. They often perform poorly on cyclic graph queries, which have become increasingly important to modern data analytics. Other join algorithms exist: Yannakakis', for example, operates on acyclic queries in runtime proportional to the input size plus the output size \\cite{yannakakis}. More recently, Ngo et al. published a join algorithm that is optimal on worst-case inputs \\cite{worst}. My contribution is to explore query planning using these join algorithms. In our approach, every query plan can be viewed as a generalized hypertree decomposition (GHD). We score each GHD using the minimal fractional hypertree width, which Ngo et al. show allows us to bound its worst-case runtime. We benchmark our plans using datasets from the Stanford Large Network Dataset Collection \\cite{dataset} and find that our performance compares favorably against that of LogicBlox, a commercial system that implements a worst-case optimal join algorithm.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"292 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2723372.2764946","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 26

Abstract

Joins are central to data processing. However, traditional query plans for joins, which are based on choosing the order of pairwise joins, are provably suboptimal. They often perform poorly on cyclic graph queries, which have become increasingly important to modern data analytics. Other join algorithms exist: Yannakakis', for example, operates on acyclic queries in runtime proportional to the input size plus the output size \cite{yannakakis}. More recently, Ngo et al. published a join algorithm that is optimal on worst-case inputs \cite{worst}. My contribution is to explore query planning using these join algorithms. In our approach, every query plan can be viewed as a generalized hypertree decomposition (GHD). We score each GHD using the minimal fractional hypertree width, which Ngo et al. show allows us to bound its worst-case runtime. We benchmark our plans using datasets from the Stanford Large Network Dataset Collection \cite{dataset} and find that our performance compares favorably against that of LogicBlox, a commercial system that implements a worst-case optimal join algorithm.

查看原文本刊更多论文

DunceCap:使用广义超树分解的查询计划

连接是数据处理的核心。然而，传统的基于选择成对连接顺序的连接查询计划是次优的。它们通常在循环图查询中表现不佳，而循环图查询对现代数据分析越来越重要。其他连接算法也存在:例如，Yannakakis'在运行时操作与输入大小和输出大小成比例的无循环查询\cite{yannakakis}。最近，Ngo等人发表了一种最优的最坏情况输入连接算法\cite{worst}。我的贡献是探索使用这些连接算法的查询计划。在我们的方法中，每个查询计划都可以看作是一个广义超树分解(GHD)。我们使用最小分数超树宽度对每个GHD进行评分，Ngo等人显示，这允许我们限制其最坏情况的运行时间。我们使用来自Stanford Large Network Dataset Collection \cite{dataset}的数据集对我们的计划进行基准测试，并发现我们的性能优于LogicBlox，这是一个实现最坏情况最优连接算法的商业系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

自引率

0.00%

发文量