Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming.

Proceedings ... annual Symposium on Foundations of Computer Science. Symposium on Foundations of Computer Science Pub Date : 2023-11-01 Epub Date: 2023-12-22 DOI:10.1109/focs57990.2023.00093

Praneeth Kacham, Rasmus Pagh, Mikkel Thorup, David P Woodruff

{"title":"Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming.","authors":"Praneeth Kacham, Rasmus Pagh, Mikkel Thorup, David P Woodruff","doi":"10.1109/focs57990.2023.00093","DOIUrl":null,"url":null,"abstract":"We revisit Nisan's classical pseudorandom generator (PRG) for space-bounded computation (STOC 1990) and its applications in streaming algorithms. We describe a new generator, HashPRG, that can be thought of as a symmetric version of Nisan's generator over larger alphabets. Our generator allows a trade-off between seed length and the time needed to compute a given block of the generator's output. HashPRG can be used to obtain derandomizations with much better update time and without sacrificing space for a large number of data stream algorithms, for example: Andoni's <math> <mrow><msub><mi>F</mi> <mi>p</mi></msub> </mrow> </math> estimation algorithm for constant <math><mrow><mi>p</mi> <mo>></mo> <mn>2</mn></mrow> </math> (ICASSP, 2017) assumes a random oracle, but achieves optimal space and constant update time. Using HashPRG's time-space trade-off we eliminate the random oracle assumption while preserving the other properties. Previously no time-optimal derandomization was known. Using similar techniques, we give an algorithm for a relaxed version of <math> <mrow><msub><mo>ℓ</mo> <mi>p</mi></msub> </mrow> </math> sampling in a turnstile stream. Both of our algorithms use <math> <mrow><mover><mi>O</mi> <mo>˜</mo></mover> <mfenced> <mrow><msup><mi>d</mi> <mrow><mn>1</mn> <mo>-</mo> <mn>2</mn> <mo>/</mo> <mi>p</mi></mrow> </msup> </mrow> </mfenced> </mrow> </math> bits of space and have <math><mrow><mi>O</mi> <mfenced><mn>1</mn></mfenced> </mrow> </math> update time.For <math><mrow><mn>0</mn> <mo><</mo> <mi>p</mi> <mo><</mo> <mn>2</mn></mrow> </math> , the <math><mrow><mn>1</mn> <mo>±</mo> <mi>ε</mi></mrow> </math> approximate <math> <mrow><msub><mi>F</mi> <mi>p</mi></msub> </mrow> </math> estimation algorithm of Kane et al., (STOC, 2011) uses an optimal <math><mrow><mi>O</mi> <mfenced> <mrow><msup><mi>ε</mi> <mrow><mo>-</mo> <mn>2</mn></mrow> </msup> <mspace></mspace> <mtext>log</mtext> <mspace></mspace> <mi>d</mi></mrow> </mfenced> </mrow> </math> bits of space but has an update time of <math><mrow><mi>O</mi> <mfenced> <mrow> <msup><mrow><mtext>log</mtext></mrow> <mn>2</mn></msup> <mfenced><mrow><mn>1</mn> <mo>/</mo> <mi>ε</mi></mrow> </mfenced> <mtext>log</mtext> <mspace></mspace> <mtext>log</mtext> <mfenced><mrow><mn>1</mn> <mo>/</mo> <mi>ε</mi></mrow> </mfenced> </mrow> </mfenced> </mrow> </math> . Using HashPRG, we show that if <math><mrow><mn>1</mn> <mo>/</mo> <msqrt><mi>d</mi></msqrt> <mo>≤</mo> <mi>ε</mi> <mo>≤</mo> <mn>1</mn> <mo>/</mo> <msup><mi>d</mi> <mi>c</mi></msup> </mrow> </math> for an arbitrarily small constant <math><mrow><mi>c</mi> <mo>></mo> <mn>0</mn></mrow> </math> , then we can obtain a <math><mrow><mn>1</mn> <mo>±</mo> <mi>ε</mi></mrow> </math> approximate <math> <mrow><msub><mi>F</mi> <mi>p</mi></msub> </mrow> </math> estimation algorithm that uses the optimal <math><mrow><mi>O</mi> <mfenced> <mrow><msup><mi>ε</mi> <mrow><mo>-</mo> <mn>2</mn></mrow> </msup> <mspace></mspace> <mtext>log</mtext> <mspace></mspace> <mi>d</mi></mrow> </mfenced> </mrow> </math> bits of space and has an update time of <math><mrow><mi>O</mi> <mfenced><mrow><mtext>log</mtext> <mspace></mspace> <mi>d</mi></mrow> </mfenced> </mrow> </math> in the Word RAM model, which is more than a quadratic improvement in the update time. We obtain similar improvements for entropy estimation.CountSketch, with the fine-grained error analysis of Minton and Price (SODA, 2014). For derandomization, they suggested a direct application of Nisan's generator, yielding a logarithmic multiplicative space overhead. With HashPRG we obtain an efficient derandomization yielding the same asymptotic space as when assuming a random oracle. Our ability to obtain a time-efficient derandomization makes crucial use of HashPRG's symmetry. We also give the first derandomization of a recent private version of CountSketch. For a <math><mi>d</mi></math> -dimensional vector <math><mi>x</mi></math> being updated in a turnstile stream, we show that <math> <mrow> <msub> <mrow><mfenced><mi>x</mi></mfenced> </mrow> <mi>∞</mi></msub> </mrow> </math> can be estimated up to an additive error of <math><mrow><mi>ε</mi> <mo>‖</mo> <mi>x</mi> <msub><mo>‖</mo> <mn>2</mn></msub> </mrow> </math> using <math><mrow><mi>O</mi> <mfenced> <mrow><msup><mi>ε</mi> <mrow><mo>-</mo> <mn>2</mn></mrow> </msup> <mspace></mspace> <mtext>log</mtext> <mfenced><mrow><mn>1</mn> <mo>/</mo> <mi>ε</mi></mrow> </mfenced> <mtext>log</mtext> <mspace></mspace> <mi>d</mi></mrow> </mfenced> </mrow> </math> bits of space. Additionally, the update time of this algorithm is <math><mrow><mi>O</mi> <mfenced><mrow><mtext>log</mtext> <mspace></mspace> <mn>1</mn> <mo>/</mo> <mi>ε</mi></mrow> </mfenced> </mrow> </math> in the Word RAM model. We show that the space complexity of this algorithm is optimal up to constant factors. However, for vectors <math><mi>x</mi></math> with <math> <mrow> <msub> <mrow><mfenced><mi>x</mi></mfenced> </mrow> <mi>∞</mi></msub> <mo>=</mo> <mi>Θ</mi> <mfenced> <mrow> <msub> <mrow><mfenced><mi>x</mi></mfenced> </mrow> <mn>2</mn></msub> </mrow> </mfenced> </mrow> </math> , we show that the lower bound can be broken by giving an algorithm that uses <math><mrow><mi>O</mi> <mfenced> <mrow><msup><mi>ε</mi> <mrow><mo>-</mo> <mn>2</mn></mrow> </msup> <mspace></mspace> <mtext>log</mtext> <mspace></mspace> <mi>d</mi></mrow> </mfenced> </mrow> </math> bits of space which approximates <math><mrow><mo>‖</mo> <mi>x</mi> <msub><mo>‖</mo> <mi>∞</mi></msub> </mrow> </math> up to an additive error of <math><mrow><mi>ε</mi> <mo>‖</mo> <mi>x</mi> <msub><mo>‖</mo> <mn>2</mn></msub> </mrow> </math> . We use our aforementioned derandomization of the CountSketch data structure to obtain this algorithm, and using the time-space trade off of HashPRG, we show that the update time of this algorithm is also <math><mrow><mi>O</mi> <mfenced><mrow><mtext>log</mtext> <mspace></mspace> <mn>1</mn> <mo>/</mo> <mi>ε</mi></mrow> </mfenced> </mrow> </math> in the Word RAM model.","PeriodicalId":93353,"journal":{"name":"Proceedings ... annual Symposium on Foundations of Computer Science. Symposium on Foundations of Computer Science","volume":"2023 ","pages":"1515-1550"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12309723/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings ... annual Symposium on Foundations of Computer Science. Symposium on Foundations of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/focs57990.2023.00093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/12/22 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We revisit Nisan's classical pseudorandom generator (PRG) for space-bounded computation (STOC 1990) and its applications in streaming algorithms. We describe a new generator, HashPRG, that can be thought of as a symmetric version of Nisan's generator over larger alphabets. Our generator allows a trade-off between seed length and the time needed to compute a given block of the generator's output. HashPRG can be used to obtain derandomizations with much better update time and without sacrificing space for a large number of data stream algorithms, for example: Andoni's $F_{p}$ estimation algorithm for constant $p > 2$ (ICASSP, 2017) assumes a random oracle, but achieves optimal space and constant update time. Using HashPRG's time-space trade-off we eliminate the random oracle assumption while preserving the other properties. Previously no time-optimal derandomization was known. Using similar techniques, we give an algorithm for a relaxed version of $ℓ_{p}$ sampling in a turnstile stream. Both of our algorithms use $\tilde{O} (d^{1 - 2 / p})$ bits of space and have $O (1)$ update time.For $0 < p < 2$ , the $1 \pm ε$ approximate $F_{p}$ estimation algorithm of Kane et al., (STOC, 2011) uses an optimal $O (ε^{- 2} log d)$ bits of space but has an update time of $O ({log}^{2} (1 / ε) log log (1 / ε))$ . Using HashPRG, we show that if $1 / \sqrt{d} \leq ε \leq 1 / d^{c}$ for an arbitrarily small constant $c > 0$ , then we can obtain a $1 \pm ε$ approximate $F_{p}$ estimation algorithm that uses the optimal $O (ε^{- 2} log d)$ bits of space and has an update time of $O (log d)$ in the Word RAM model, which is more than a quadratic improvement in the update time. We obtain similar improvements for entropy estimation.CountSketch, with the fine-grained error analysis of Minton and Price (SODA, 2014). For derandomization, they suggested a direct application of Nisan's generator, yielding a logarithmic multiplicative space overhead. With HashPRG we obtain an efficient derandomization yielding the same asymptotic space as when assuming a random oracle. Our ability to obtain a time-efficient derandomization makes crucial use of HashPRG's symmetry. We also give the first derandomization of a recent private version of CountSketch. For a $d$ -dimensional vector $x$ being updated in a turnstile stream, we show that ${(x)}_{\infty}$ can be estimated up to an additive error of $ε ‖ x ‖_{2}$ using $O (ε^{- 2} log (1 / ε) log d)$ bits of space. Additionally, the update time of this algorithm is $O (log 1 / ε)$ in the Word RAM model. We show that the space complexity of this algorithm is optimal up to constant factors. However, for vectors $x$ with ${(x)}_{\infty} = Θ ({(x)}_{2})$ , we show that the lower bound can be broken by giving an algorithm that uses $O (ε^{- 2} log d)$ bits of space which approximates $‖ x ‖_{\infty}$ up to an additive error of $ε ‖ x ‖_{2}$ . We use our aforementioned derandomization of the CountSketch data structure to obtain this algorithm, and using the time-space trade off of HashPRG, we show that the update time of this algorithm is also $O (log 1 / ε)$ in the Word RAM model.

查看原文本刊更多论文

空间边界计算的伪随机散列及其在流中的应用。

我们回顾Nisan的经典伪随机生成器（PRG）用于空间有界计算（STOC 1990）及其在流算法中的应用。我们描述了一个新的生成器HashPRG，它可以被认为是Nisan生成器在更大字母上的对称版本。我们的生成器允许在种子长度和计算生成器输出的给定块所需的时间之间进行权衡。HashPRG可用于以更好的更新时间获得非随机化，并且不会为大量数据流算法牺牲空间，例如：Andoni的F p估计算法for constant p > 2 （ICASSP, 2017）假设一个随机oracle，但实现了最优的空间和恒定的更新时间。使用HashPRG的时空权衡，我们消除了随机预言假设，同时保留了其他属性。以前没有时间最优非随机化是已知的。使用类似的技术，我们给出了一个简化版本的在转门流中的采样算法。我们的算法都使用O ~ d1 - 2 / p位空间和O ~ 1更新时间。对于0 p 2， Kane等人（STOC, 2011）的1±ε近似F p估计算法使用最优的O ε - 2 log d位空间，但更新时间为O log 21 / ε log 1 / ε。利用HashPRG，我们证明了对于任意小的常数c > 0，如果1 / d≤ε≤1 / d c，那么我们可以得到一个1±ε近似的F - p估计算法，该算法使用最优的O ε - 2 log d位空间，并且在Word RAM模型中更新时间为O log d，更新时间提高了2倍以上。我们在熵估计上得到了类似的改进。countssketch，与Minton和Price的细粒度误差分析（SODA, 2014）。对于非随机化，他们建议直接应用Nisan的生成器，产生对数乘法空间开销。利用HashPRG，我们获得了一种有效的非随机化方法，其产生的渐近空间与假设随机oracle时相同。我们获得时间效率高的非随机化的能力至关重要地利用了HashPRG的对称性。我们还给出了countssketch最近的私有版本的第一个非随机化。对于在旋转门流中更新的d维向量x，我们证明了x∞可以使用O ε - 2 log 1 / ε log d位空间估计到ε‖x‖2的加性误差。此外，在Word RAM模型下，该算法的更新时间为O log 1 / ε。我们证明了该算法的空间复杂度在常数因子范围内是最优的。然而，对于x∞= Θ x 2的向量x，我们证明了下界可以通过给出一个算法来打破，该算法使用O ε - 2 log d位空间，该算法近似于‖x‖∞，直至ε‖x‖2的加性误差。我们使用上述countssketch数据结构的非随机化来获得该算法，并使用HashPRG的时空权衡，我们表明该算法的更新时间在Word RAM模型中也是O log 1 / ε。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings ... annual Symposium on Foundations of Computer Science. Symposium on Foundations of Computer Science

自引率

0.00%

发文量