{"title":"Linear Hashing Is Awesome","authors":"M. B. T. Knudsen","doi":"10.1109/FOCS.2016.45","DOIUrl":null,"url":null,"abstract":"The most classic textbook hash function, e.g. taught in CLRS [MIT Press'09], is h(x) = ((ax + b) mod p) mod m, (◇) where x, a, b ϵ {0, 1, ..., p-1} and a, b are chosen uniformly at random. It is known that (◇) is 2-independent and almost uniform provided p is a prime and p ≫ m. This implies that when using (◇) to build a hash table with chaining that contains n ≤ m keys, the expected query time is O(1) and the expected length of the longest chain is O(√n). This result holds for any 2-independent hash function. No hash function can improve on the expected query time, but the upper bound on the expected length of the longest chain is not known to be tight for (◇). Partially addressing this problem, Alon et al. [STOC'97] proved the existence of a class of linear hash functions such that the expected length of the longest chain is (√n) and leave as an open problem to decide which nontrivial properties (◇) has. We make the first progress on this fundamental problem, by showing that the expected length of the longest chain is at most n1/3o(1) which means that the performance of (◇) is similar to that of a independent hash function for which we can prove an upper bound of O(n1/3). As a lemma we show that within a fixed set of integers there are few pairs such that the height of the ratio of the pairs are small. Given two non-zero coprime integers n, m ϵ ℤ with the height of n/m is max t{|n|, |m|}, and the height is a way of measuring how complex a fraction is. This is proved using a mixture of techniques from additive combinatorics and number theory, and we believe that the result might be of independent interest. For a natural variation of (◇), we show that it is possible to apply second order moment bounds even when a hash value is fixed. As a consequence: For min-wise hashing it was known that any key from a set of n keys has the smallest hash value with probability O (1√n). We improve this to n-1+o(1). For linear probing it was known that the worst case expected query time is O (√n). We improve this to no(1).","PeriodicalId":414001,"journal":{"name":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FOCS.2016.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
The most classic textbook hash function, e.g. taught in CLRS [MIT Press'09], is h(x) = ((ax + b) mod p) mod m, (◇) where x, a, b ϵ {0, 1, ..., p-1} and a, b are chosen uniformly at random. It is known that (◇) is 2-independent and almost uniform provided p is a prime and p ≫ m. This implies that when using (◇) to build a hash table with chaining that contains n ≤ m keys, the expected query time is O(1) and the expected length of the longest chain is O(√n). This result holds for any 2-independent hash function. No hash function can improve on the expected query time, but the upper bound on the expected length of the longest chain is not known to be tight for (◇). Partially addressing this problem, Alon et al. [STOC'97] proved the existence of a class of linear hash functions such that the expected length of the longest chain is (√n) and leave as an open problem to decide which nontrivial properties (◇) has. We make the first progress on this fundamental problem, by showing that the expected length of the longest chain is at most n1/3o(1) which means that the performance of (◇) is similar to that of a independent hash function for which we can prove an upper bound of O(n1/3). As a lemma we show that within a fixed set of integers there are few pairs such that the height of the ratio of the pairs are small. Given two non-zero coprime integers n, m ϵ ℤ with the height of n/m is max t{|n|, |m|}, and the height is a way of measuring how complex a fraction is. This is proved using a mixture of techniques from additive combinatorics and number theory, and we believe that the result might be of independent interest. For a natural variation of (◇), we show that it is possible to apply second order moment bounds even when a hash value is fixed. As a consequence: For min-wise hashing it was known that any key from a set of n keys has the smallest hash value with probability O (1√n). We improve this to n-1+o(1). For linear probing it was known that the worst case expected query time is O (√n). We improve this to no(1).
最经典的教科书哈希函数,例如在CLRS [MIT出版社'09]中教授的,是h(x) = ((ax + b) mod p) mod m,(◇)其中x, a, b ε{0,1,…, p-1}和a, b是均匀随机选择的。已知(◇)是2独立的且几乎一致的,只要p是素数且p < m。这意味着当使用(◇)构建包含n≤m个键的链哈希表时,期望查询时间为O(1),最长链的期望长度为O(√n)。这个结果适用于任何2独立的哈希函数。没有哈希函数可以提高期望的查询时间,但是最长链的期望长度的上界对于(◇)来说并不紧。Alon等人[STOC'97]部分解决了这个问题,证明了一类线性哈希函数的存在性,使得最长链的期望长度为(√n),并留下一个开放问题来决定哪些非平凡性质(◇)具有。我们在这个基本问题上取得了第一个进展,通过证明最长链的期望长度最多为n1/ 30(1),这意味着(◇)的性能类似于我们可以证明上界为O(n /3)的独立哈希函数的性能。作为一个引理,我们证明了在一个固定的整数集合中,很少有对使得这些对之比的高度很小。给定两个高度为n/m的非零素数n, m λ m等于max t{|n|, |m|},高度是衡量分数复杂程度的一种方式。这是使用加性组合学和数论的混合技术证明的,我们相信结果可能是独立的兴趣。对于(◇)的自然变化,我们证明了即使哈希值是固定的,也可以应用二阶矩界。结果是:对于最小散列,已知n个键的集合中的任何键具有最小的散列值,概率为O(1√n)。我们把它改进成n-1+ 0 (1)对于线性探测,已知最坏情况下的预期查询时间为O(√n)。我们将其改进为no(1)