用随机访问和随机顺序枚举回答连接查询的并集

Nofar Carmeli, Shai Zeevi, Christoph Berkholz, A. Conte, B. Kimelfeld, Nicole Schweikardt
{"title":"用随机访问和随机顺序枚举回答连接查询的并集","authors":"Nofar Carmeli, Shai Zeevi, Christoph Berkholz, A. Conte, B. Kimelfeld, Nicole Schweikardt","doi":"10.1145/3531055","DOIUrl":null,"url":null,"abstract":"As data analytics becomes more crucial to digital systems, so grows the importance of characterizing the database queries that admit a more efficient evaluation. We consider the tractability yardstick of answer enumeration with a polylogarithmic delay after a linear-time preprocessing phase. Such an evaluation is obtained by constructing, in the preprocessing phase, a data structure that supports polylogarithmic-delay enumeration. In this article, we seek a structure that supports the more demanding task of a “random permutation”: polylogarithmic-delay enumeration in truly random order. Enumeration of this kind is required if downstream applications assume that the intermediate results are representative of the whole result set in a statistically meaningful manner. An even more demanding task is that of “random access”: polylogarithmic-time retrieval of an answer whose position is given. We establish that the free-connex acyclic CQs are tractable in all three senses: enumeration, random-order enumeration, and random access; and in the absence of self-joins, it follows from past results that every other CQ is intractable by each of the three (under some fine-grained complexity assumptions). However, the three yardsticks are separated in the case of a union of CQs (UCQ): while a union of free-connex acyclic CQs has a tractable enumeration, it may (provably) admit no random access. We identify a fragment of such UCQs where we can guarantee random access with polylogarithmic access time (and linear-time preprocessing) and a more general fragment where we can guarantee tractable random permutation. For general unions of free-connex acyclic CQs, we devise two algorithms with relaxed guarantees: one has logarithmic delay in expectation, and the other provides a permutation that is almost uniformly distributed. Finally, we present an implementation and an empirical study that show a considerable practical superiority of our random-order enumeration approach over state-of-the-art alternatives.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"33 1","pages":"1 - 49"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Answering (Unions of) Conjunctive Queries using Random Access and Random-Order Enumeration\",\"authors\":\"Nofar Carmeli, Shai Zeevi, Christoph Berkholz, A. Conte, B. Kimelfeld, Nicole Schweikardt\",\"doi\":\"10.1145/3531055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As data analytics becomes more crucial to digital systems, so grows the importance of characterizing the database queries that admit a more efficient evaluation. We consider the tractability yardstick of answer enumeration with a polylogarithmic delay after a linear-time preprocessing phase. Such an evaluation is obtained by constructing, in the preprocessing phase, a data structure that supports polylogarithmic-delay enumeration. In this article, we seek a structure that supports the more demanding task of a “random permutation”: polylogarithmic-delay enumeration in truly random order. Enumeration of this kind is required if downstream applications assume that the intermediate results are representative of the whole result set in a statistically meaningful manner. An even more demanding task is that of “random access”: polylogarithmic-time retrieval of an answer whose position is given. We establish that the free-connex acyclic CQs are tractable in all three senses: enumeration, random-order enumeration, and random access; and in the absence of self-joins, it follows from past results that every other CQ is intractable by each of the three (under some fine-grained complexity assumptions). However, the three yardsticks are separated in the case of a union of CQs (UCQ): while a union of free-connex acyclic CQs has a tractable enumeration, it may (provably) admit no random access. We identify a fragment of such UCQs where we can guarantee random access with polylogarithmic access time (and linear-time preprocessing) and a more general fragment where we can guarantee tractable random permutation. For general unions of free-connex acyclic CQs, we devise two algorithms with relaxed guarantees: one has logarithmic delay in expectation, and the other provides a permutation that is almost uniformly distributed. Finally, we present an implementation and an empirical study that show a considerable practical superiority of our random-order enumeration approach over state-of-the-art alternatives.\",\"PeriodicalId\":6983,\"journal\":{\"name\":\"ACM Transactions on Database Systems (TODS)\",\"volume\":\"33 1\",\"pages\":\"1 - 49\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Database Systems (TODS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3531055\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Database Systems (TODS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3531055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

随着数据分析对数字系统变得越来越重要,对数据库查询进行特征化以进行更有效的评估也变得越来越重要。我们考虑了经过线性时间预处理阶段后具有多对数延迟的回答枚举的可追溯性尺度。通过在预处理阶段构造一个支持多对数延迟枚举的数据结构,可以获得这样的求值。在本文中,我们寻求一种结构来支持更苛刻的“随机排列”任务:真正随机顺序的多对数延迟枚举。如果下游应用程序假设中间结果以统计上有意义的方式代表整个结果集,则需要进行这种枚举。一个要求更高的任务是“随机存取”:用多对数时间检索给定位置的答案。证明了自由连通无环cq在枚举、随机顺序枚举和随机访问三种意义上都是可处理的;在没有自连接的情况下,从过去的结果可以得出,其他CQ对这三个CQ都是难以处理的(在一些细粒度的复杂性假设下)。然而,在cq的并集(UCQ)的情况下,这三个尺度是分开的:虽然自由连接的无环cq的并集具有可处理的枚举,但它可能(可证明地)不允许随机访问。我们确定了这样的ucq的一个片段,我们可以保证随机访问的多对数访问时间(和线性时间预处理)和一个更一般的片段,我们可以保证可处理的随机排列。对于自由连接无环cq的一般并集,我们设计了两种具有宽松保证的算法:一种算法具有对数期望延迟,另一种算法提供了几乎均匀分布的排列。最后,我们提出了一个实现和实证研究,表明我们的随机顺序枚举方法比最先进的替代方法具有相当大的实际优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Answering (Unions of) Conjunctive Queries using Random Access and Random-Order Enumeration
As data analytics becomes more crucial to digital systems, so grows the importance of characterizing the database queries that admit a more efficient evaluation. We consider the tractability yardstick of answer enumeration with a polylogarithmic delay after a linear-time preprocessing phase. Such an evaluation is obtained by constructing, in the preprocessing phase, a data structure that supports polylogarithmic-delay enumeration. In this article, we seek a structure that supports the more demanding task of a “random permutation”: polylogarithmic-delay enumeration in truly random order. Enumeration of this kind is required if downstream applications assume that the intermediate results are representative of the whole result set in a statistically meaningful manner. An even more demanding task is that of “random access”: polylogarithmic-time retrieval of an answer whose position is given. We establish that the free-connex acyclic CQs are tractable in all three senses: enumeration, random-order enumeration, and random access; and in the absence of self-joins, it follows from past results that every other CQ is intractable by each of the three (under some fine-grained complexity assumptions). However, the three yardsticks are separated in the case of a union of CQs (UCQ): while a union of free-connex acyclic CQs has a tractable enumeration, it may (provably) admit no random access. We identify a fragment of such UCQs where we can guarantee random access with polylogarithmic access time (and linear-time preprocessing) and a more general fragment where we can guarantee tractable random permutation. For general unions of free-connex acyclic CQs, we devise two algorithms with relaxed guarantees: one has logarithmic delay in expectation, and the other provides a permutation that is almost uniformly distributed. Finally, we present an implementation and an empirical study that show a considerable practical superiority of our random-order enumeration approach over state-of-the-art alternatives.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信