Backyard Cuckoo Hashing: Constant Worst-Case Operations with a Succinct Representation

2010 IEEE 51st Annual Symposium on Foundations of Computer Science Pub Date : 2009-12-29 DOI:10.1109/FOCS.2010.80

Yuriy Arbitman, M. Naor, G. Segev

{"title":"Backyard Cuckoo Hashing: Constant Worst-Case Operations with a Succinct Representation","authors":"Yuriy Arbitman, M. Naor, G. Segev","doi":"10.1109/FOCS.2010.80","DOIUrl":null,"url":null,"abstract":"The performance of a dynamic dictionary is measured mainly by its update time, lookup time, and space consumption. In terms of update time and lookup time there are known constructions that guarantee constant-time operations in the worst case with high probability, and in terms of space consumption there are known constructions that use essentially optimal space. However, although the first analysis of a dynamic dictionary dates back more than 45 years ago (when Knuth analyzed linear probing in 1963), the trade-off between these aspects of performance is still not completely understood. In this paper we settle two fundamental open problems: \\begin{itemize} \\item We construct the first dynamic dictionary that enjoys the best of both worlds: it stores $\\boldsymbol{n}$ elements using $\\boldsymbol{(1 + \\epsilon) n}$ memory words, and guarantees constant-time operations in the worst case with high probability. Specifically, for any \\boldsymbol{\\epsilon = \\Omega ( (\\log \\log n / \\log n)^{1/2} )}$ and for any sequence of polynomially many operations, with high probability over the randomness of the initialization phase, all operations are performed in constant time which is independent of $\\boldsymbol{\\epsilon}$. The construction is a two-level variant of cuckoo hashing, augmented with a ``backyard'' that handles a large fraction of the elements, together with a de-amortized perfect hashing scheme for eliminating the dependency on $\\boldsymbol{\\epsilon}$. \\item We present a variant of the above construction that uses only $\\boldsymbol{(1 + o(1))\\B}$ bits, where $\\boldsymbol{\\B}$ is the information-theoretic lower bound for representing a set of size $\\boldsymbol{n}$ taken from a universe of size $\\boldsymbol{u}$, and guarantees constant-time operations in the worst case with high probability, as before. This problem was open even in the {\\em amortized} setting. One of the main ingredients of our construction is a permutation-based variant of cuckoo hashing, which significantly improves the space consumption of cuckoo hashing when dealing with a rather small universe. \\end{itemize}","PeriodicalId":228365,"journal":{"name":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"88","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FOCS.2010.80","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 88

Abstract

The performance of a dynamic dictionary is measured mainly by its update time, lookup time, and space consumption. In terms of update time and lookup time there are known constructions that guarantee constant-time operations in the worst case with high probability, and in terms of space consumption there are known constructions that use essentially optimal space. However, although the first analysis of a dynamic dictionary dates back more than 45 years ago (when Knuth analyzed linear probing in 1963), the trade-off between these aspects of performance is still not completely understood. In this paper we settle two fundamental open problems: \begin{itemize} \item We construct the first dynamic dictionary that enjoys the best of both worlds: it stores $\boldsymbol{n}$ elements using $\boldsymbol{(1 + \epsilon) n}$ memory words, and guarantees constant-time operations in the worst case with high probability. Specifically, for any \boldsymbol{\epsilon = \Omega ( (\log \log n / \log n)^{1/2} )}$ and for any sequence of polynomially many operations, with high probability over the randomness of the initialization phase, all operations are performed in constant time which is independent of $\boldsymbol{\epsilon}$. The construction is a two-level variant of cuckoo hashing, augmented with a ``backyard'' that handles a large fraction of the elements, together with a de-amortized perfect hashing scheme for eliminating the dependency on $\boldsymbol{\epsilon}$. \item We present a variant of the above construction that uses only $\boldsymbol{(1 + o(1))\B}$ bits, where $\boldsymbol{\B}$ is the information-theoretic lower bound for representing a set of size $\boldsymbol{n}$ taken from a universe of size $\boldsymbol{u}$, and guarantees constant-time operations in the worst case with high probability, as before. This problem was open even in the {\em amortized} setting. One of the main ingredients of our construction is a permutation-based variant of cuckoo hashing, which significantly improves the space consumption of cuckoo hashing when dealing with a rather small universe. \end{itemize}

查看原文本刊更多论文

后院布谷鸟哈希:具有简洁表示的常数最坏情况操作

动态字典的性能主要通过其更新时间、查找时间和空间消耗来衡量。就更新时间和查找时间而言，有一些已知的结构可以在最坏的情况下以高概率保证恒定时间的操作，而就空间消耗而言，有一些已知的结构使用本质上最优的空间。然而，尽管对动态字典的第一次分析可以追溯到45年前(当时Knuth在1963年分析了线性探测)，但性能的这些方面之间的权衡仍然没有完全理解。本文解决了两个基本的开放问题:我们构造了第一个具有两方面优点的动态字典:它使用$\boldsymbol{(1 + \epsilon) n}$存储$\boldsymbol{n}$元素，并保证在最坏情况下以高概率进行恒定时间操作。具体来说，对于任何\boldsymbol{\epsilon = \Omega ((\log \log n / \log n)^{1/2})}$，以及对于任何多项式多个操作的序列，在初始化阶段的随机性上具有高概率，所有操作都在常数时间内执行，这与$\boldsymbol{\epsilon}$无关。该构造是布谷鸟哈希的两级变体，增加了一个处理大部分元素的“后院”，以及一个消除对$\boldsymbol{\epsilon}$依赖的非平摊完美哈希方案。我们给出了上述结构的一种变体，它只使用$\boldsymbol{(1 + o(1))\B}$位，其中$\boldsymbol{\B}$是表示大小为$\boldsymbol{n}$的集合的信息论下界，该集合来自大小为$\boldsymbol{u}$的集合，并保证在最坏情况下具有高概率的恒定时间操作，如前所示。即使在{\em平摊}设置中，这个问题也是开放的。我们构建的主要成分之一是基于排列的布谷鸟哈希变体，它在处理较小的宇宙时显著提高了布谷鸟哈希的空间消耗。结束\{逐条列记}

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 IEEE 51st Annual Symposium on Foundations of Computer Science

自引率

0.00%

发文量