Estimation of the Size of Union of Delphic Sets: Achieving Independence from Stream Size

Kuldeep S. Meel, Sourav Chakraborty, N. V. Vinodchandran
{"title":"Estimation of the Size of Union of Delphic Sets: Achieving Independence from Stream Size","authors":"Kuldeep S. Meel, Sourav Chakraborty, N. V. Vinodchandran","doi":"10.1145/3517804.3526222","DOIUrl":null,"url":null,"abstract":"Given a family of sets (S1, S2,... SM) over a universe Ω, estimating the size of their union in the data streaming model is a fundamental computational problem with a wide variety of applications. The holy grail in the field of streaming is to seek design of algorithms that achieve (ε, δ)-approximation with poly(log |Ω|, ε-1, log δ-1) space and update time complexity. Earlier investigations achieve algorithms with desired space and update time complexity for restricted cases such as singletons (Distinct Elements problem), one-dimensional ranges, arithmetic progressions, and sub-cubes. However, techniques used in these works fail for many other simple structured sets. A prominent example is that of Klee's Measure Problem (KMP), wherein every set Si is represented by an axis-parallel rectangle in d-dimensional spaces. Despite extensive prior work, the best-known streaming algorithms for many of these cases depend on the size of the stream, and therefore the problem of whether there exists a streaming algorithm for estimations of size of the union of sets with poly(log |Ω|, ε-1, log δ-1) space and update time complexity has remained open. In this work, we focus on certain general families of sets called Delphic families (which allows efficient membership, sampling, and cardinality queries). Such families of sets capture several well-known problems, including KMP, test coverage, and hypervolume estimation. The primary contribution of our work is to resolve the above-mentioned open problem for streams over Delphic families. In particular, we design the first streaming algorithm for estimating |⋃i=1M Si| with poly(log |Ω|, ε-1, log δ-1) space and update time complexity (independent of M, the length of the stream) when each Si is a member from a Delphic family of sets. We further generalize our results to larger families of sets, called approximate-Delphic families, for which the size of a set can be known approximately but not exactly. Our results resolve two of the open problems listed in Meel, Vinodchandran, Chakraborty (PODS-21).","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3517804.3526222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Given a family of sets (S1, S2,... SM) over a universe Ω, estimating the size of their union in the data streaming model is a fundamental computational problem with a wide variety of applications. The holy grail in the field of streaming is to seek design of algorithms that achieve (ε, δ)-approximation with poly(log |Ω|, ε-1, log δ-1) space and update time complexity. Earlier investigations achieve algorithms with desired space and update time complexity for restricted cases such as singletons (Distinct Elements problem), one-dimensional ranges, arithmetic progressions, and sub-cubes. However, techniques used in these works fail for many other simple structured sets. A prominent example is that of Klee's Measure Problem (KMP), wherein every set Si is represented by an axis-parallel rectangle in d-dimensional spaces. Despite extensive prior work, the best-known streaming algorithms for many of these cases depend on the size of the stream, and therefore the problem of whether there exists a streaming algorithm for estimations of size of the union of sets with poly(log |Ω|, ε-1, log δ-1) space and update time complexity has remained open. In this work, we focus on certain general families of sets called Delphic families (which allows efficient membership, sampling, and cardinality queries). Such families of sets capture several well-known problems, including KMP, test coverage, and hypervolume estimation. The primary contribution of our work is to resolve the above-mentioned open problem for streams over Delphic families. In particular, we design the first streaming algorithm for estimating |⋃i=1M Si| with poly(log |Ω|, ε-1, log δ-1) space and update time complexity (independent of M, the length of the stream) when each Si is a member from a Delphic family of sets. We further generalize our results to larger families of sets, called approximate-Delphic families, for which the size of a set can be known approximately but not exactly. Our results resolve two of the open problems listed in Meel, Vinodchandran, Chakraborty (PODS-21).
德尔菲集并集大小的估计:实现流大小的独立性
给定一个集合族(S1, S2,…SM)在一个宇宙Ω上,在数据流模型中估计它们的联合的大小是一个具有广泛应用的基本计算问题。流媒体领域的圣杯是寻求算法的设计,以poly(log |Ω|, ε-1, log δ-1)空间和更新时间复杂度来实现(ε, δ)-逼近。早期的研究实现了具有所需空间和更新时间复杂度的算法,用于限制情况,如单例(不同元素问题)、一维范围、等差数列和子立方体。然而,在这些作品中使用的技术不适用于许多其他简单的结构集。一个突出的例子是Klee's Measure Problem (KMP),其中每个集合Si都由d维空间中的轴平行矩形表示。尽管之前有大量的工作,最著名的流算法在许多情况下依赖于流的大小,因此是否存在一种流算法来估计具有poly(log |Ω|, ε-1, log δ-1)空间和更新时间复杂度的集的并集的大小的问题仍然是开放的。在这项工作中,我们关注的是被称为德尔菲族的集合的某些一般族(它允许有效的隶属、抽样和基数查询)。这样的集合族捕获了几个众所周知的问题,包括KMP、测试覆盖率和超容量估计。我们工作的主要贡献是解决上述关于德尔菲族的流的开放性问题。特别地,我们设计了第一个流算法,当每个Si是一个德尔菲集合族的成员时,使用poly(log |Ω|, ε-1, log δ-1)空间和更新时间复杂度(与流的长度M无关)来估计|∈i=1M Si|。我们进一步将我们的结果推广到更大的集合族,称为近似德尔菲族,其中集合的大小可以近似地知道,但不是精确地知道。我们的结果解决了Meel, Vinodchandran, Chakraborty (PODS-21)中列出的两个开放问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信