Optimal Join Algorithms Meet Top-k.

Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald
{"title":"Optimal Join Algorithms Meet Top-<i>k</i>.","authors":"Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald","doi":"10.1145/3318464.3383132","DOIUrl":null,"url":null,"abstract":"<p><p><i>Top-k queries</i> have been studied intensively in the database community and they are an important means to reduce query cost when only the \"best\" or \"most interesting\" results are needed instead of the full output. While some optimality results exist, e.g., the famous Threshold Algorithm, they hold only in a fairly limited model of computation that does not account for the cost incurred by large intermediate results and hence is not aligned with typical database-optimizer cost models. On the other hand, the idea of avoiding large intermediate results is arguably the main goal of recent work on <i>optimal join algorithms</i>, which uses the standard RAM model of computation to determine algorithm complexity. This research has created a lot of excitement due to its promise of reducing the time complexity of join queries with cycles, but it has mostly focused on full-output computation. We argue that the two areas can and should be studied from a unified point of view in order to achieve optimality in the common model of computation for a very general class of top-<i>k</i>-style join queries. This tutorial has two main objectives. First, we will explore and contrast the main assumptions, concepts, and algorithmic achievements of the two research areas. Second, we will cover recent, as well as some older, approaches that emerged at the intersection to support efficient <i>ranked enumeration of join-query results</i>. These are related to classic work on <i>k</i>-shortest path algorithms and more general optimization problems, some of which dates back to the 1950s. We demonstrate that this line of research warrants renewed attention in the challenging context of ranked enumeration for general join queries.</p>","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"2020 ","pages":"2659-2665"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7872590/pdf/nihms-1666240.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. ACM-SIGMOD International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3318464.3383132","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Top-k queries have been studied intensively in the database community and they are an important means to reduce query cost when only the "best" or "most interesting" results are needed instead of the full output. While some optimality results exist, e.g., the famous Threshold Algorithm, they hold only in a fairly limited model of computation that does not account for the cost incurred by large intermediate results and hence is not aligned with typical database-optimizer cost models. On the other hand, the idea of avoiding large intermediate results is arguably the main goal of recent work on optimal join algorithms, which uses the standard RAM model of computation to determine algorithm complexity. This research has created a lot of excitement due to its promise of reducing the time complexity of join queries with cycles, but it has mostly focused on full-output computation. We argue that the two areas can and should be studied from a unified point of view in order to achieve optimality in the common model of computation for a very general class of top-k-style join queries. This tutorial has two main objectives. First, we will explore and contrast the main assumptions, concepts, and algorithmic achievements of the two research areas. Second, we will cover recent, as well as some older, approaches that emerged at the intersection to support efficient ranked enumeration of join-query results. These are related to classic work on k-shortest path algorithms and more general optimization problems, some of which dates back to the 1950s. We demonstrate that this line of research warrants renewed attention in the challenging context of ranked enumeration for general join queries.

最佳连接算法与 Top-k 相结合。
当只需要 "最好 "或 "最有趣 "的结果而不是全部输出结果时,Top-k 查询是降低查询成本的重要手段。虽然存在一些优化结果,例如著名的阈值算法,但这些结果只在相当有限的计算模型中成立,没有考虑大的中间结果所产生的成本,因此与典型的数据库优化器成本模型不一致。另一方面,避免大量中间结果的想法可以说是最近关于最优连接算法研究的主要目标,该研究使用标准 RAM 计算模型来确定算法复杂度。这项研究有望降低循环连接查询的时间复杂度,因此引起了广泛关注,但它主要集中在全输出计算上。我们认为,这两个领域可以而且应该从统一的角度进行研究,以便在通用计算模型中为一类非常通用的拓扑式连接查询实现最优性。本教程有两个主要目标。首先,我们将探讨和对比这两个研究领域的主要假设、概念和算法成就。其次,我们将介绍最近和以前出现的一些方法,这些方法支持对联接查询结果进行高效的排序枚举。这些方法与 k 最短路径算法和更一般的优化问题方面的经典工作有关,其中一些工作可以追溯到 20 世纪 50 年代。我们证明,在对一般连接查询进行排序枚举这一具有挑战性的背景下,这一研究方向值得重新关注。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信