{"title":"DunceCap: Query Plans Using Generalized Hypertree Decompositions","authors":"Susan Tu, C. Ré","doi":"10.1145/2723372.2764946","DOIUrl":null,"url":null,"abstract":"Joins are central to data processing. However, traditional query plans for joins, which are based on choosing the order of pairwise joins, are provably suboptimal. They often perform poorly on cyclic graph queries, which have become increasingly important to modern data analytics. Other join algorithms exist: Yannakakis', for example, operates on acyclic queries in runtime proportional to the input size plus the output size \\cite{yannakakis}. More recently, Ngo et al. published a join algorithm that is optimal on worst-case inputs \\cite{worst}. My contribution is to explore query planning using these join algorithms. In our approach, every query plan can be viewed as a generalized hypertree decomposition (GHD). We score each GHD using the minimal fractional hypertree width, which Ngo et al. show allows us to bound its worst-case runtime. We benchmark our plans using datasets from the Stanford Large Network Dataset Collection \\cite{dataset} and find that our performance compares favorably against that of LogicBlox, a commercial system that implements a worst-case optimal join algorithm.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"292 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2723372.2764946","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26
Abstract
Joins are central to data processing. However, traditional query plans for joins, which are based on choosing the order of pairwise joins, are provably suboptimal. They often perform poorly on cyclic graph queries, which have become increasingly important to modern data analytics. Other join algorithms exist: Yannakakis', for example, operates on acyclic queries in runtime proportional to the input size plus the output size \cite{yannakakis}. More recently, Ngo et al. published a join algorithm that is optimal on worst-case inputs \cite{worst}. My contribution is to explore query planning using these join algorithms. In our approach, every query plan can be viewed as a generalized hypertree decomposition (GHD). We score each GHD using the minimal fractional hypertree width, which Ngo et al. show allows us to bound its worst-case runtime. We benchmark our plans using datasets from the Stanford Large Network Dataset Collection \cite{dataset} and find that our performance compares favorably against that of LogicBlox, a commercial system that implements a worst-case optimal join algorithm.