{"title":"Backyard Cuckoo Hashing: Constant Worst-Case Operations with a Succinct Representation","authors":"Yuriy Arbitman, M. Naor, G. Segev","doi":"10.1109/FOCS.2010.80","DOIUrl":null,"url":null,"abstract":"The performance of a dynamic dictionary is measured mainly by its update time, lookup time, and space consumption. In terms of update time and lookup time there are known constructions that guarantee constant-time operations in the worst case with high probability, and in terms of space consumption there are known constructions that use essentially optimal space. However, although the first analysis of a dynamic dictionary dates back more than 45 years ago (when Knuth analyzed linear probing in 1963), the trade-off between these aspects of performance is still not completely understood. In this paper we settle two fundamental open problems: \\begin{itemize} \\item We construct the first dynamic dictionary that enjoys the best of both worlds: it stores $\\boldsymbol{n}$ elements using $\\boldsymbol{(1 + \\epsilon) n}$ memory words, and guarantees constant-time operations in the worst case with high probability. Specifically, for any \\boldsymbol{\\epsilon = \\Omega ( (\\log \\log n / \\log n)^{1/2} )}$ and for any sequence of polynomially many operations, with high probability over the randomness of the initialization phase, all operations are performed in constant time which is independent of $\\boldsymbol{\\epsilon}$. The construction is a two-level variant of cuckoo hashing, augmented with a ``backyard'' that handles a large fraction of the elements, together with a de-amortized perfect hashing scheme for eliminating the dependency on $\\boldsymbol{\\epsilon}$. \\item We present a variant of the above construction that uses only $\\boldsymbol{(1 + o(1))\\B}$ bits, where $\\boldsymbol{\\B}$ is the information-theoretic lower bound for representing a set of size $\\boldsymbol{n}$ taken from a universe of size $\\boldsymbol{u}$, and guarantees constant-time operations in the worst case with high probability, as before. This problem was open even in the {\\em amortized} setting. One of the main ingredients of our construction is a permutation-based variant of cuckoo hashing, which significantly improves the space consumption of cuckoo hashing when dealing with a rather small universe. \\end{itemize}","PeriodicalId":228365,"journal":{"name":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"88","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FOCS.2010.80","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 88
Abstract
The performance of a dynamic dictionary is measured mainly by its update time, lookup time, and space consumption. In terms of update time and lookup time there are known constructions that guarantee constant-time operations in the worst case with high probability, and in terms of space consumption there are known constructions that use essentially optimal space. However, although the first analysis of a dynamic dictionary dates back more than 45 years ago (when Knuth analyzed linear probing in 1963), the trade-off between these aspects of performance is still not completely understood. In this paper we settle two fundamental open problems: \begin{itemize} \item We construct the first dynamic dictionary that enjoys the best of both worlds: it stores $\boldsymbol{n}$ elements using $\boldsymbol{(1 + \epsilon) n}$ memory words, and guarantees constant-time operations in the worst case with high probability. Specifically, for any \boldsymbol{\epsilon = \Omega ( (\log \log n / \log n)^{1/2} )}$ and for any sequence of polynomially many operations, with high probability over the randomness of the initialization phase, all operations are performed in constant time which is independent of $\boldsymbol{\epsilon}$. The construction is a two-level variant of cuckoo hashing, augmented with a ``backyard'' that handles a large fraction of the elements, together with a de-amortized perfect hashing scheme for eliminating the dependency on $\boldsymbol{\epsilon}$. \item We present a variant of the above construction that uses only $\boldsymbol{(1 + o(1))\B}$ bits, where $\boldsymbol{\B}$ is the information-theoretic lower bound for representing a set of size $\boldsymbol{n}$ taken from a universe of size $\boldsymbol{u}$, and guarantees constant-time operations in the worst case with high probability, as before. This problem was open even in the {\em amortized} setting. One of the main ingredients of our construction is a permutation-based variant of cuckoo hashing, which significantly improves the space consumption of cuckoo hashing when dealing with a rather small universe. \end{itemize}