Balls and Bins: Smaller Hash Families and Faster Evaluation

L. Elisa Celis, Omer Reingold, G. Segev, Udi Wieder
{"title":"Balls and Bins: Smaller Hash Families and Faster Evaluation","authors":"L. Elisa Celis, Omer Reingold, G. Segev, Udi Wieder","doi":"10.1137/120871626","DOIUrl":null,"url":null,"abstract":"A fundamental fact in the analysis of randomized algorithms is that when n balls are hashed into n bins independently and uniformly at random, with high probability each bin contains at most O(log n / log(log n)) balls. In various applications, however, the assumption that a truly random hash function is available is not always valid, and explicit functions are required. In this paper we study the size of families (or, equivalently, the description length of their functions) that guarantee a maximal load of O(log n / log(log n)) with high probability, as well as the evaluation time of their functions. Whereas such functions must be described using Omega(log n) bits, the best upper bound was formerly O(log^2 n / log(log n)) bits, which is attained by O(log n / log(log n))-wise independent functions. Traditional constructions of the latter offer an evaluation time of O(log n / log(log n)), which according to Siegel's lower bound [FOCS '89] can be reduced only at the cost of significantly increasing the description length. We construct two families that guarantee a maximal load of O(log n / log(log n)) with high probability. Our constructions are based on two different approaches, and exhibit different trade-offs between the description length and the evaluation time. The first construction shows that O(log n / log(log n))-wise independence can in fact be replaced by & quot; gradually increasing independence & quot;, resulting in functions that are described using O(log n log(log n)) bits and evaluated in time O(log n log(log n)). The second construction is based on derandomization techniques for space-bounded computations combined with a tailored construction of a pseudorandom generator, resulting in functions that are described using O(log^(3/2) n) bits and evaluated in time O(sqrt(log n)). The latter can be compared to Siegel's lower bound stating that O(log n / log(log n))-wise independent functions that are evaluated in time O(sqrt(log n)) must be described using Omega(2^(sqrt(log n))) bits.","PeriodicalId":326048,"journal":{"name":"2011 IEEE 52nd Annual Symposium on Foundations of Computer Science","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"49","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 52nd Annual Symposium on Foundations of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/120871626","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 49

Abstract

A fundamental fact in the analysis of randomized algorithms is that when n balls are hashed into n bins independently and uniformly at random, with high probability each bin contains at most O(log n / log(log n)) balls. In various applications, however, the assumption that a truly random hash function is available is not always valid, and explicit functions are required. In this paper we study the size of families (or, equivalently, the description length of their functions) that guarantee a maximal load of O(log n / log(log n)) with high probability, as well as the evaluation time of their functions. Whereas such functions must be described using Omega(log n) bits, the best upper bound was formerly O(log^2 n / log(log n)) bits, which is attained by O(log n / log(log n))-wise independent functions. Traditional constructions of the latter offer an evaluation time of O(log n / log(log n)), which according to Siegel's lower bound [FOCS '89] can be reduced only at the cost of significantly increasing the description length. We construct two families that guarantee a maximal load of O(log n / log(log n)) with high probability. Our constructions are based on two different approaches, and exhibit different trade-offs between the description length and the evaluation time. The first construction shows that O(log n / log(log n))-wise independence can in fact be replaced by & quot; gradually increasing independence & quot;, resulting in functions that are described using O(log n log(log n)) bits and evaluated in time O(log n log(log n)). The second construction is based on derandomization techniques for space-bounded computations combined with a tailored construction of a pseudorandom generator, resulting in functions that are described using O(log^(3/2) n) bits and evaluated in time O(sqrt(log n)). The latter can be compared to Siegel's lower bound stating that O(log n / log(log n))-wise independent functions that are evaluated in time O(sqrt(log n)) must be described using Omega(2^(sqrt(log n))) bits.
球和箱:更小的哈希族和更快的评估
随机算法分析中的一个基本事实是,当n个球被独立地、均匀地随机散列到n个箱子中时,每个箱子有很大可能最多包含O(log n / log(log n))个球。然而,在各种应用程序中,真正随机哈希函数可用的假设并不总是有效的,需要显式函数。本文研究了保证高概率最大负载为O(log n / log(log n))的族的大小(即其函数的描述长度),以及其函数的评估时间。虽然这样的函数必须用(log n)位来描述,但最好的上界以前是O(log^2 n / log(log n))位,这是通过O(log n / log(log n))独立函数来实现的。后者的传统结构提供了O(log n / log(log n))的评估时间,根据Siegel的下界[FOCS '89],只有以显著增加描述长度为代价才能减少该时间。我们构造了两个族,保证最大负载为O(log n / log(log n))的高概率。我们的构建基于两种不同的方法,并且在描述长度和评估时间之间表现出不同的权衡。第一个构造表明,O(log n / log(log n))的独立性实际上可以用& quot;逐渐增加独立性,导致函数用O(log n log(log n))位来描述,并在O(log n log(log n))时间内求值。第二种构造基于空间有界计算的非随机化技术,结合了伪随机生成器的定制构造,从而产生使用O(log^(3/2) n)位描述并在O(sqrt(log n))时间内求值的函数。后者可以与西格尔的下界相比较,即O(log n / log(log n))明智的独立函数在O(sqrt(log n))时间内求值,必须用(2^(sqrt(log n)))位来描述。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信