Praneeth Kacham, Rasmus Pagh, Mikkel Thorup, David P Woodruff
{"title":"Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming.","authors":"Praneeth Kacham, Rasmus Pagh, Mikkel Thorup, David P Woodruff","doi":"10.1109/focs57990.2023.00093","DOIUrl":null,"url":null,"abstract":"<p><p>We revisit Nisan's classical pseudorandom generator (PRG) for space-bounded computation (STOC 1990) and its applications in streaming algorithms. We describe a new generator, HashPRG, that can be thought of as a symmetric version of Nisan's generator over larger alphabets. Our generator allows a trade-off between seed length and the time needed to compute a given block of the generator's output. HashPRG can be used to obtain derandomizations with much better update time and <i>without sacrificing space</i> for a large number of data stream algorithms, for example: Andoni's <math> <mrow><msub><mi>F</mi> <mi>p</mi></msub> </mrow> </math> estimation algorithm for constant <math><mrow><mi>p</mi> <mo>></mo> <mn>2</mn></mrow> </math> (ICASSP, 2017) assumes a random oracle, but achieves optimal space and constant update time. Using HashPRG's time-space trade-off we eliminate the random oracle assumption while preserving the other properties. Previously no time-optimal derandomization was known. Using similar techniques, we give an algorithm for a relaxed version of <math> <mrow><msub><mo>ℓ</mo> <mi>p</mi></msub> </mrow> </math> sampling in a turnstile stream. Both of our algorithms use <math> <mrow><mover><mi>O</mi> <mo>˜</mo></mover> <mfenced> <mrow><msup><mi>d</mi> <mrow><mn>1</mn> <mo>-</mo> <mn>2</mn> <mo>/</mo> <mi>p</mi></mrow> </msup> </mrow> </mfenced> </mrow> </math> bits of space and have <math><mrow><mi>O</mi> <mfenced><mn>1</mn></mfenced> </mrow> </math> update time.For <math><mrow><mn>0</mn> <mo><</mo> <mi>p</mi> <mo><</mo> <mn>2</mn></mrow> </math> , the <math><mrow><mn>1</mn> <mo>±</mo> <mi>ε</mi></mrow> </math> approximate <math> <mrow><msub><mi>F</mi> <mi>p</mi></msub> </mrow> </math> estimation algorithm of Kane et al., (STOC, 2011) uses an optimal <math><mrow><mi>O</mi> <mfenced> <mrow><msup><mi>ε</mi> <mrow><mo>-</mo> <mn>2</mn></mrow> </msup> <mspace></mspace> <mtext>log</mtext> <mspace></mspace> <mi>d</mi></mrow> </mfenced> </mrow> </math> bits of space but has an update time of <math><mrow><mi>O</mi> <mfenced> <mrow> <msup><mrow><mtext>log</mtext></mrow> <mn>2</mn></msup> <mfenced><mrow><mn>1</mn> <mo>/</mo> <mi>ε</mi></mrow> </mfenced> <mtext>log</mtext> <mspace></mspace> <mtext>log</mtext> <mfenced><mrow><mn>1</mn> <mo>/</mo> <mi>ε</mi></mrow> </mfenced> </mrow> </mfenced> </mrow> </math> . Using HashPRG, we show that if <math><mrow><mn>1</mn> <mo>/</mo> <msqrt><mi>d</mi></msqrt> <mo>≤</mo> <mi>ε</mi> <mo>≤</mo> <mn>1</mn> <mo>/</mo> <msup><mi>d</mi> <mi>c</mi></msup> </mrow> </math> for an arbitrarily small constant <math><mrow><mi>c</mi> <mo>></mo> <mn>0</mn></mrow> </math> , then we can obtain a <math><mrow><mn>1</mn> <mo>±</mo> <mi>ε</mi></mrow> </math> approximate <math> <mrow><msub><mi>F</mi> <mi>p</mi></msub> </mrow> </math> estimation algorithm that uses the optimal <math><mrow><mi>O</mi> <mfenced> <mrow><msup><mi>ε</mi> <mrow><mo>-</mo> <mn>2</mn></mrow> </msup> <mspace></mspace> <mtext>log</mtext> <mspace></mspace> <mi>d</mi></mrow> </mfenced> </mrow> </math> bits of space and has an update time of <math><mrow><mi>O</mi> <mfenced><mrow><mtext>log</mtext> <mspace></mspace> <mi>d</mi></mrow> </mfenced> </mrow> </math> in the Word RAM model, which is more than a quadratic improvement in the update time. We obtain similar improvements for entropy estimation.CountSketch, with the fine-grained error analysis of Minton and Price (SODA, 2014). For derandomization, they suggested a direct application of Nisan's generator, yielding a logarithmic multiplicative space overhead. With HashPRG we obtain an efficient derandomization yielding the same asymptotic space as when assuming a random oracle. Our ability to obtain a time-efficient derandomization makes crucial use of HashPRG's symmetry. We also give the first derandomization of a recent private version of CountSketch. For a <math><mi>d</mi></math> -dimensional vector <math><mi>x</mi></math> being updated in a turnstile stream, we show that <math> <mrow> <msub> <mrow><mfenced><mi>x</mi></mfenced> </mrow> <mi>∞</mi></msub> </mrow> </math> can be estimated up to an additive error of <math><mrow><mi>ε</mi> <mo>‖</mo> <mi>x</mi> <msub><mo>‖</mo> <mn>2</mn></msub> </mrow> </math> using <math><mrow><mi>O</mi> <mfenced> <mrow><msup><mi>ε</mi> <mrow><mo>-</mo> <mn>2</mn></mrow> </msup> <mspace></mspace> <mtext>log</mtext> <mfenced><mrow><mn>1</mn> <mo>/</mo> <mi>ε</mi></mrow> </mfenced> <mtext>log</mtext> <mspace></mspace> <mi>d</mi></mrow> </mfenced> </mrow> </math> bits of space. Additionally, the update time of this algorithm is <math><mrow><mi>O</mi> <mfenced><mrow><mtext>log</mtext> <mspace></mspace> <mn>1</mn> <mo>/</mo> <mi>ε</mi></mrow> </mfenced> </mrow> </math> in the Word RAM model. We show that the space complexity of this algorithm is optimal up to constant factors. However, for vectors <math><mi>x</mi></math> with <math> <mrow> <msub> <mrow><mfenced><mi>x</mi></mfenced> </mrow> <mi>∞</mi></msub> <mo>=</mo> <mi>Θ</mi> <mfenced> <mrow> <msub> <mrow><mfenced><mi>x</mi></mfenced> </mrow> <mn>2</mn></msub> </mrow> </mfenced> </mrow> </math> , we show that the lower bound can be broken by giving an algorithm that uses <math><mrow><mi>O</mi> <mfenced> <mrow><msup><mi>ε</mi> <mrow><mo>-</mo> <mn>2</mn></mrow> </msup> <mspace></mspace> <mtext>log</mtext> <mspace></mspace> <mi>d</mi></mrow> </mfenced> </mrow> </math> bits of space which approximates <math><mrow><mo>‖</mo> <mi>x</mi> <msub><mo>‖</mo> <mi>∞</mi></msub> </mrow> </math> up to an additive error of <math><mrow><mi>ε</mi> <mo>‖</mo> <mi>x</mi> <msub><mo>‖</mo> <mn>2</mn></msub> </mrow> </math> . We use our aforementioned derandomization of the CountSketch data structure to obtain this algorithm, and using the time-space trade off of HashPRG, we show that the update time of this algorithm is also <math><mrow><mi>O</mi> <mfenced><mrow><mtext>log</mtext> <mspace></mspace> <mn>1</mn> <mo>/</mo> <mi>ε</mi></mrow> </mfenced> </mrow> </math> in the Word RAM model.</p>","PeriodicalId":93353,"journal":{"name":"Proceedings ... annual Symposium on Foundations of Computer Science. Symposium on Foundations of Computer Science","volume":"2023 ","pages":"1515-1550"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12309723/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings ... annual Symposium on Foundations of Computer Science. Symposium on Foundations of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/focs57990.2023.00093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/12/22 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We revisit Nisan's classical pseudorandom generator (PRG) for space-bounded computation (STOC 1990) and its applications in streaming algorithms. We describe a new generator, HashPRG, that can be thought of as a symmetric version of Nisan's generator over larger alphabets. Our generator allows a trade-off between seed length and the time needed to compute a given block of the generator's output. HashPRG can be used to obtain derandomizations with much better update time and without sacrificing space for a large number of data stream algorithms, for example: Andoni's estimation algorithm for constant (ICASSP, 2017) assumes a random oracle, but achieves optimal space and constant update time. Using HashPRG's time-space trade-off we eliminate the random oracle assumption while preserving the other properties. Previously no time-optimal derandomization was known. Using similar techniques, we give an algorithm for a relaxed version of sampling in a turnstile stream. Both of our algorithms use bits of space and have update time.For , the approximate estimation algorithm of Kane et al., (STOC, 2011) uses an optimal bits of space but has an update time of . Using HashPRG, we show that if for an arbitrarily small constant , then we can obtain a approximate estimation algorithm that uses the optimal bits of space and has an update time of in the Word RAM model, which is more than a quadratic improvement in the update time. We obtain similar improvements for entropy estimation.CountSketch, with the fine-grained error analysis of Minton and Price (SODA, 2014). For derandomization, they suggested a direct application of Nisan's generator, yielding a logarithmic multiplicative space overhead. With HashPRG we obtain an efficient derandomization yielding the same asymptotic space as when assuming a random oracle. Our ability to obtain a time-efficient derandomization makes crucial use of HashPRG's symmetry. We also give the first derandomization of a recent private version of CountSketch. For a -dimensional vector being updated in a turnstile stream, we show that can be estimated up to an additive error of using bits of space. Additionally, the update time of this algorithm is in the Word RAM model. We show that the space complexity of this algorithm is optimal up to constant factors. However, for vectors with , we show that the lower bound can be broken by giving an algorithm that uses bits of space which approximates up to an additive error of . We use our aforementioned derandomization of the CountSketch data structure to obtain this algorithm, and using the time-space trade off of HashPRG, we show that the update time of this algorithm is also in the Word RAM model.