最佳连接算法与 Top-k 相结合。

Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2020-06-01 DOI:10.1145/3318464.3383132

Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

{"title":"最佳连接算法与 Top-k 相结合。","authors":"Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald","doi":"10.1145/3318464.3383132","DOIUrl":null,"url":null,"abstract":"Top-k queries have been studied intensively in the database community and they are an important means to reduce query cost when only the \"best\" or \"most interesting\" results are needed instead of the full output. While some optimality results exist, e.g., the famous Threshold Algorithm, they hold only in a fairly limited model of computation that does not account for the cost incurred by large intermediate results and hence is not aligned with typical database-optimizer cost models. On the other hand, the idea of avoiding large intermediate results is arguably the main goal of recent work on optimal join algorithms, which uses the standard RAM model of computation to determine algorithm complexity. This research has created a lot of excitement due to its promise of reducing the time complexity of join queries with cycles, but it has mostly focused on full-output computation. We argue that the two areas can and should be studied from a unified point of view in order to achieve optimality in the common model of computation for a very general class of top-k-style join queries. This tutorial has two main objectives. First, we will explore and contrast the main assumptions, concepts, and algorithmic achievements of the two research areas. Second, we will cover recent, as well as some older, approaches that emerged at the intersection to support efficient ranked enumeration of join-query results. These are related to classic work on k-shortest path algorithms and more general optimization problems, some of which dates back to the 1950s. We demonstrate that this line of research warrants renewed attention in the challenging context of ranked enumeration for general join queries.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"2020 ","pages":"2659-2665"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7872590/pdf/nihms-1666240.pdf","citationCount":"0","resultStr":"{\"title\":\"Optimal Join Algorithms Meet Top-k.\",\"authors\":\"Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald\",\"doi\":\"10.1145/3318464.3383132\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Top-k queries have been studied intensively in the database community and they are an important means to reduce query cost when only the \\\"best\\\" or \\\"most interesting\\\" results are needed instead of the full output. While some optimality results exist, e.g., the famous Threshold Algorithm, they hold only in a fairly limited model of computation that does not account for the cost incurred by large intermediate results and hence is not aligned with typical database-optimizer cost models. On the other hand, the idea of avoiding large intermediate results is arguably the main goal of recent work on optimal join algorithms, which uses the standard RAM model of computation to determine algorithm complexity. This research has created a lot of excitement due to its promise of reducing the time complexity of join queries with cycles, but it has mostly focused on full-output computation. We argue that the two areas can and should be studied from a unified point of view in order to achieve optimality in the common model of computation for a very general class of top-k-style join queries. This tutorial has two main objectives. First, we will explore and contrast the main assumptions, concepts, and algorithmic achievements of the two research areas. Second, we will cover recent, as well as some older, approaches that emerged at the intersection to support efficient ranked enumeration of join-query results. These are related to classic work on k-shortest path algorithms and more general optimization problems, some of which dates back to the 1950s. We demonstrate that this line of research warrants renewed attention in the challenging context of ranked enumeration for general join queries.\",\"PeriodicalId\":87344,\"journal\":{\"name\":\"Proceedings. ACM-SIGMOD International Conference on Management of Data\",\"volume\":\"2020 \",\"pages\":\"2659-2665\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7872590/pdf/nihms-1666240.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. ACM-SIGMOD International Conference on Management of Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3318464.3383132\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. ACM-SIGMOD International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3318464.3383132","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

当只需要 "最好 "或 "最有趣 "的结果而不是全部输出结果时，Top-k 查询是降低查询成本的重要手段。虽然存在一些优化结果，例如著名的阈值算法，但这些结果只在相当有限的计算模型中成立，没有考虑大的中间结果所产生的成本，因此与典型的数据库优化器成本模型不一致。另一方面，避免大量中间结果的想法可以说是最近关于最优连接算法研究的主要目标，该研究使用标准 RAM 计算模型来确定算法复杂度。这项研究有望降低循环连接查询的时间复杂度，因此引起了广泛关注，但它主要集中在全输出计算上。我们认为，这两个领域可以而且应该从统一的角度进行研究，以便在通用计算模型中为一类非常通用的拓扑式连接查询实现最优性。本教程有两个主要目标。首先，我们将探讨和对比这两个研究领域的主要假设、概念和算法成就。其次，我们将介绍最近和以前出现的一些方法，这些方法支持对联接查询结果进行高效的排序枚举。这些方法与 k 最短路径算法和更一般的优化问题方面的经典工作有关，其中一些工作可以追溯到 20 世纪 50 年代。我们证明，在对一般连接查询进行排序枚举这一具有挑战性的背景下，这一研究方向值得重新关注。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optimal Join Algorithms Meet Top-k.

Top-k queries have been studied intensively in the database community and they are an important means to reduce query cost when only the "best" or "most interesting" results are needed instead of the full output. While some optimality results exist, e.g., the famous Threshold Algorithm, they hold only in a fairly limited model of computation that does not account for the cost incurred by large intermediate results and hence is not aligned with typical database-optimizer cost models. On the other hand, the idea of avoiding large intermediate results is arguably the main goal of recent work on optimal join algorithms, which uses the standard RAM model of computation to determine algorithm complexity. This research has created a lot of excitement due to its promise of reducing the time complexity of join queries with cycles, but it has mostly focused on full-output computation. We argue that the two areas can and should be studied from a unified point of view in order to achieve optimality in the common model of computation for a very general class of top-k-style join queries. This tutorial has two main objectives. First, we will explore and contrast the main assumptions, concepts, and algorithmic achievements of the two research areas. Second, we will cover recent, as well as some older, approaches that emerged at the intersection to support efficient ranked enumeration of join-query results. These are related to classic work on k-shortest path algorithms and more general optimization problems, some of which dates back to the 1950s. We demonstrate that this line of research warrants renewed attention in the challenging context of ranked enumeration for general join queries.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. ACM-SIGMOD International Conference on Management of Data

自引率

0.00%

发文量