{"title":"Towards Optimal Moment Estimation in Streaming and Distributed Models","authors":"Rajesh Jayaram, David P. Woodruff","doi":"https://dl.acm.org/doi/10.1145/3596494","DOIUrl":null,"url":null,"abstract":"<p>One of the oldest problems in the data stream model is to approximate the <i>p</i>th moment \\(\\Vert \\mathbf {X}\\Vert _p^p = \\sum _{i=1}^n \\mathbf {X}_i^p\\) of an underlying non-negative vector \\(\\mathbf {X}\\in \\mathbb {R}^n\\), which is presented as a sequence of \\(\\mathrm{poly}(n)\\) updates to its coordinates. Of particular interest is when \\(p \\in (0,2]\\). Although a tight space bound of \\(\\Theta (\\epsilon ^{-2} \\log n)\\) bits is known for this problem when both positive and negative updates are allowed, surprisingly, there is still a gap in the space complexity of this problem when all updates are positive. Specifically, the upper bound is \\(O(\\epsilon ^{-2} \\log n)\\) bits, while the lower bound is only \\(\\Omega (\\epsilon ^{-2} + \\log n)\\) bits. Recently, an upper bound of \\(\\tilde{O}(\\epsilon ^{-2} + \\log n)\\) bits was obtained under the assumption that the updates arrive in a <i>random order</i>. </p><p>We show that for \\(p \\in (0, 1]\\), the random order assumption is not needed. Namely, we give an upper bound for worst-case streams of \\(\\tilde{O}(\\epsilon ^{-2} + \\log n)\\) bits for estimating \\(\\Vert \\mathbf {X}\\Vert _p^p\\). Our techniques also give new upper bounds for estimating the empirical entropy in a stream. However, we show that for \\(p \\in (1,2]\\), in the natural coordinator and blackboard distributed communication topologies, there is an \\(\\tilde{O}(\\epsilon ^{-2})\\) bit max-communication upper bound based on a randomized rounding scheme. Our protocols also give rise to protocols for heavy hitters and approximate matrix product. We generalize our results to arbitrary communication topologies <i>G</i>, obtaining an \\(\\tilde{O}(\\epsilon ^{2} \\log d)\\) max-communication upper bound, where <i>d</i> is the diameter of <i>G</i>. Interestingly, our upper bound rules out natural communication complexity-based approaches for proving an \\(\\Omega (\\epsilon ^{-2} \\log n)\\) bit lower bound for \\(p \\in (1,2]\\) for streaming algorithms. In particular, any such lower bound must come from a topology with large diameter.</p>","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"7 21","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2023-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Algorithms","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3596494","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
One of the oldest problems in the data stream model is to approximate the pth moment \(\Vert \mathbf {X}\Vert _p^p = \sum _{i=1}^n \mathbf {X}_i^p\) of an underlying non-negative vector \(\mathbf {X}\in \mathbb {R}^n\), which is presented as a sequence of \(\mathrm{poly}(n)\) updates to its coordinates. Of particular interest is when \(p \in (0,2]\). Although a tight space bound of \(\Theta (\epsilon ^{-2} \log n)\) bits is known for this problem when both positive and negative updates are allowed, surprisingly, there is still a gap in the space complexity of this problem when all updates are positive. Specifically, the upper bound is \(O(\epsilon ^{-2} \log n)\) bits, while the lower bound is only \(\Omega (\epsilon ^{-2} + \log n)\) bits. Recently, an upper bound of \(\tilde{O}(\epsilon ^{-2} + \log n)\) bits was obtained under the assumption that the updates arrive in a random order.
We show that for \(p \in (0, 1]\), the random order assumption is not needed. Namely, we give an upper bound for worst-case streams of \(\tilde{O}(\epsilon ^{-2} + \log n)\) bits for estimating \(\Vert \mathbf {X}\Vert _p^p\). Our techniques also give new upper bounds for estimating the empirical entropy in a stream. However, we show that for \(p \in (1,2]\), in the natural coordinator and blackboard distributed communication topologies, there is an \(\tilde{O}(\epsilon ^{-2})\) bit max-communication upper bound based on a randomized rounding scheme. Our protocols also give rise to protocols for heavy hitters and approximate matrix product. We generalize our results to arbitrary communication topologies G, obtaining an \(\tilde{O}(\epsilon ^{2} \log d)\) max-communication upper bound, where d is the diameter of G. Interestingly, our upper bound rules out natural communication complexity-based approaches for proving an \(\Omega (\epsilon ^{-2} \log n)\) bit lower bound for \(p \in (1,2]\) for streaming algorithms. In particular, any such lower bound must come from a topology with large diameter.
期刊介绍:
ACM Transactions on Algorithms welcomes submissions of original research of the highest quality dealing with algorithms that are inherently discrete and finite, and having mathematical content in a natural way, either in the objective or in the analysis. Most welcome are new algorithms and data structures, new and improved analyses, and complexity results. Specific areas of computation covered by the journal include
combinatorial searches and objects;
counting;
discrete optimization and approximation;
randomization and quantum computation;
parallel and distributed computation;
algorithms for
graphs,
geometry,
arithmetic,
number theory,
strings;
on-line analysis;
cryptography;
coding;
data compression;
learning algorithms;
methods of algorithmic analysis;
discrete algorithms for application areas such as
biology,
economics,
game theory,
communication,
computer systems and architecture,
hardware design,
scientific computing