A simple proof of a new set disjointness with applications to data streams

Akshay Kamath, Eric Price, David P. Woodruff
{"title":"A simple proof of a new set disjointness with applications to data streams","authors":"Akshay Kamath, Eric Price, David P. Woodruff","doi":"10.4230/LIPIcs.CCC.2021.37","DOIUrl":null,"url":null,"abstract":"The multiplayer promise set disjointness is one of the most widely used problems from communication complexity in applications. In this problem there are k players with subsets S1, ..., Sk, each drawn from {1, 2,..., n}, and we are promised that either the sets are (1) pairwise disjoint, or (2) there is a unique element j occurring in all the sets, which are otherwise pairwise disjoint. The total communication of solving this problem with constant probability in the blackboard model is Ω(n/k). We observe for most applications, it instead suffices to look at what we call the \"mostly\" set disjointness problem, which changes case (2) to say there is a unique element j occurring in at least half of the sets, and the sets are otherwise disjoint. This change gives us a much simpler proof of an Ω(n/k) randomized total communication lower bound, avoiding Hellinger distance and Poincare inequalities. Our proof also gives strong lower bounds for high probability protocols, which are much larger than what is possible for the set disjointness problem. Using this we show several new results for data streams: 1. for ℓ2-Heavy Hitters, any O(1)-pass streaming algorithm in the insertion-only model for detecting if an ε-ℓ2-heavy hitter exists requires [EQUATION] bits of memory, which is optimal up to a log n factor. For deterministic algorithms and constant ε, this gives an Ω(n1/2) lower bound, improving the prior Ω(log n) lower bound. We also obtain lower bounds for Zipfian distributions. 2. for ℓp-Estimation, p > 2, we show an O(1)-pass Ω(n1−2/p log(1/δ)) bit lower bound for outputting an O(1)- approximation with probability 1 − δ, in the insertion-only model. This is optimal, and the best previous lower bound was Ω(n1−2/p + log(1/δ)). 3. for low rank approximation of a sparse matrix in RdXn, if we see the rows of a matrix one at a time in the row-order model, each row having O(1) non-zero entries, any deterministic algorithm requires [EQUATION] memory to output an O(1)-approximate rank-1 approximation. Finally, we consider strict and general turnstile streaming models, and show separations between sketching lower bounds and non-sketching upper bounds for the heavy hitters problem.","PeriodicalId":336911,"journal":{"name":"Proceedings of the 36th Computational Complexity Conference","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 36th Computational Complexity Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.CCC.2021.37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

The multiplayer promise set disjointness is one of the most widely used problems from communication complexity in applications. In this problem there are k players with subsets S1, ..., Sk, each drawn from {1, 2,..., n}, and we are promised that either the sets are (1) pairwise disjoint, or (2) there is a unique element j occurring in all the sets, which are otherwise pairwise disjoint. The total communication of solving this problem with constant probability in the blackboard model is Ω(n/k). We observe for most applications, it instead suffices to look at what we call the "mostly" set disjointness problem, which changes case (2) to say there is a unique element j occurring in at least half of the sets, and the sets are otherwise disjoint. This change gives us a much simpler proof of an Ω(n/k) randomized total communication lower bound, avoiding Hellinger distance and Poincare inequalities. Our proof also gives strong lower bounds for high probability protocols, which are much larger than what is possible for the set disjointness problem. Using this we show several new results for data streams: 1. for ℓ2-Heavy Hitters, any O(1)-pass streaming algorithm in the insertion-only model for detecting if an ε-ℓ2-heavy hitter exists requires [EQUATION] bits of memory, which is optimal up to a log n factor. For deterministic algorithms and constant ε, this gives an Ω(n1/2) lower bound, improving the prior Ω(log n) lower bound. We also obtain lower bounds for Zipfian distributions. 2. for ℓp-Estimation, p > 2, we show an O(1)-pass Ω(n1−2/p log(1/δ)) bit lower bound for outputting an O(1)- approximation with probability 1 − δ, in the insertion-only model. This is optimal, and the best previous lower bound was Ω(n1−2/p + log(1/δ)). 3. for low rank approximation of a sparse matrix in RdXn, if we see the rows of a matrix one at a time in the row-order model, each row having O(1) non-zero entries, any deterministic algorithm requires [EQUATION] memory to output an O(1)-approximate rank-1 approximation. Finally, we consider strict and general turnstile streaming models, and show separations between sketching lower bounds and non-sketching upper bounds for the heavy hitters problem.
数据流应用的一个简单证明
多承诺集不连接是应用中由于通信复杂性而引起的最广泛使用的问题之一。在这个问题中,有k个玩家,他们的子集是S1,…, Sk,分别从{1,2,…, n},并且我们被保证要么这些集合是(1)成对不相交的,要么(2)在所有集合中存在唯一元素j,否则这些集合是成对不相交的。在黑板模型中解决这个问题的总概率是Ω(n/k)我们观察到,对于大多数应用来说,它足以让我们看到所谓的“大多数”集合不相交问题,它改变了情况(2),说至少有一个唯一元素j出现在一半的集合中,否则这些集合是不相交的。这个变化给了我们一个更简单的Ω(n/k)随机总通信下界的证明,避免了海灵格距离和庞加莱不等式。我们的证明也给出了高概率协议的强下界,它比集合不连通问题的可能下界大得多。使用它,我们展示了数据流的几个新结果:对于2-Heavy Hitters,在插入模型中,任何O(1)次流算法用于检测是否存在ε- 2-Heavy Hitters需要[等式]位内存,这是最优的,最高可达log n个因子。对于确定性算法和常数ε,这给出了Ω(n1/2)下界,改进了先前的Ω(log n)下界。我们也得到了Zipfian分布的下界。2. 对于p > 2,我们给出了一个O(1)-pass Ω(n1−2/p log(1/δ))位下界,用于输出概率为1−δ的O(1)-近似,在纯插入模型中。这是最优的,之前最好的下界是Ω(n1−2/p + log(1/δ))。3.对于RdXn中的稀疏矩阵的低秩近似值,如果我们在行序模型中一次看到一个矩阵的行,每行有O(1)个非零条目,任何确定性算法都需要[EQUATION]内存来输出O(1)个近似秩1近似值。最后,我们考虑了严格和一般的转门流模型,并给出了重拳问题的素描下界和非素描上界的分离。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信