Data Streams with Bounded Deletions

Rajesh Jayaram, David P. Woodruff
{"title":"Data Streams with Bounded Deletions","authors":"Rajesh Jayaram, David P. Woodruff","doi":"10.1145/3196959.3196986","DOIUrl":null,"url":null,"abstract":"Two prevalent models in the data stream literature are the insertion-only and turnstile models. Unfortunately, many important streaming problems require a Θ(log(n)) multiplicative factor more space for turnstile streams than for insertion-only streams. This complexity gap often arises because the underlying frequency vector f is very close to $0$, after accounting for all insertions and deletions to items. Signal detection in such streams is difficult, given the large number of deletions. In this work, we propose an intermediate model which, given a parameter α ≥ 1, lower bounds the norm |f|p by a 1/α-fraction of the Lp mass of the stream had all updates been positive. Here, for a vector f, |f|p = (∑i=1n |fi|p)1/p, and the value of p we choose depends on the application. This gives a fluid medium between insertion only streams (with α = 1), and turnstile streams (with α = poly(n)), and allows for analysis in terms of α. We show that for streams with this α-property, for many fundamental streaming problems we can replace a O(log(n)) factor in the space usage for algorithms in the turnstile model with a O(log(α)) factor. This is true for identifying heavy hitters, inner product estimation, L0 estimation, L1 estimation, L1 sampling, and support sampling. For each problem, we give matching or nearly matching lower bounds for α-property streams. We note that in practice, many important turnstile data streams are in fact α-property streams for small values of α. For such applications, our results represent significant improvements in efficiency for all the aforementioned problems.","PeriodicalId":344370,"journal":{"name":"Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3196959.3196986","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

Two prevalent models in the data stream literature are the insertion-only and turnstile models. Unfortunately, many important streaming problems require a Θ(log(n)) multiplicative factor more space for turnstile streams than for insertion-only streams. This complexity gap often arises because the underlying frequency vector f is very close to $0$, after accounting for all insertions and deletions to items. Signal detection in such streams is difficult, given the large number of deletions. In this work, we propose an intermediate model which, given a parameter α ≥ 1, lower bounds the norm |f|p by a 1/α-fraction of the Lp mass of the stream had all updates been positive. Here, for a vector f, |f|p = (∑i=1n |fi|p)1/p, and the value of p we choose depends on the application. This gives a fluid medium between insertion only streams (with α = 1), and turnstile streams (with α = poly(n)), and allows for analysis in terms of α. We show that for streams with this α-property, for many fundamental streaming problems we can replace a O(log(n)) factor in the space usage for algorithms in the turnstile model with a O(log(α)) factor. This is true for identifying heavy hitters, inner product estimation, L0 estimation, L1 estimation, L1 sampling, and support sampling. For each problem, we give matching or nearly matching lower bounds for α-property streams. We note that in practice, many important turnstile data streams are in fact α-property streams for small values of α. For such applications, our results represent significant improvements in efficiency for all the aforementioned problems.
有界删除的数据流
数据流文献中流行的两种模型是纯插入模型和旋转门模型。不幸的是,许多重要的流问题需要Θ(log(n))的乘法因子,对于旋转门流比仅插入流有更多的空间。这种复杂性差距经常出现,因为底层频率向量f非常接近$0$,在考虑了所有条目的插入和删除之后。由于大量的删除,在这样的流中检测信号是困难的。在这项工作中,我们提出了一个中间模型,当参数α≥1时,流的Lp质量的1/α分数的范数|的|p的下界所有更新都是正的。这里,对于向量f, |f|p =(∑i=1n |fi|p)1/p,我们选择的p的值取决于应用。这给出了仅插入流(α = 1)和旋转门流(α = poly(n))之间的流体介质,并允许根据α进行分析。我们证明,对于具有这种α-性质的流,对于许多基本的流问题,我们可以用O(log(α))因子代替旋转门模型中算法的空间使用中的O(log(n))因子。这对于识别重量级人物、内积估计、L0估计、L1估计、L1抽样和支持抽样都是正确的。对于每个问题,我们给出了α-性质流的匹配或近似匹配下界。我们注意到,在实践中,许多重要的旋转门数据流实际上是α-性质流对于小的α值。对于这样的应用,我们的结果代表了上述所有问题的效率的显著提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信