Analysis of Robin Hood and Other Hashing Algorithms Under the Random Probing Model, With and Without Deletions

Combinatorics, Probability and Computing Pub Date : 2018-08-14 DOI:10.1017/S0963548318000408

P. V. Poblete, Alfredo Viola

{"title":"Analysis of Robin Hood and Other Hashing Algorithms Under the Random Probing Model, With and Without Deletions","authors":"P. V. Poblete, Alfredo Viola","doi":"10.1017/S0963548318000408","DOIUrl":null,"url":null,"abstract":"Thirty years ago, the Robin Hood collision resolution strategy was introduced for open addressing hash tables, and a recurrence equation was found for the distribution of its search cost. Although this recurrence could not be solved analytically, it allowed for numerical computations that, remarkably, suggested that the variance of the search cost approached a value of 1.883 when the table was full. Furthermore, by using a non-standard mean-centred search algorithm, this would imply that searches could be performed in expected constant time even in a full table. In spite of the time elapsed since these observations were made, no progress has been made in proving them. In this paper we introduce a technique to work around the intractability of the recurrence equation by solving instead an associated differential equation. While this does not provide an exact solution, it is sufficiently powerful to prove a bound of π2/3 for the variance, and thus obtain a proof that the variance of Robin Hood is bounded by a small constant for load factors arbitrarily close to 1. As a corollary, this proves that the mean-centred search algorithm runs in expected constant time. We also use this technique to study the performance of Robin Hood hash tables under a long sequence of insertions and deletions, where deletions are implemented by marking elements as deleted. We prove that, in this case, the variance is bounded by 1/(1−α), where α is the load factor. To model the behaviour of these hash tables, we use a unified approach that we apply also to study the First-Come-First-Served and Last-Come-First-Served collision resolution disciplines, both with and without deletions.","PeriodicalId":10503,"journal":{"name":"Combinatorics, Probability and Computing","volume":"17 1","pages":"600 - 617"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Combinatorics, Probability and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/S0963548318000408","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Thirty years ago, the Robin Hood collision resolution strategy was introduced for open addressing hash tables, and a recurrence equation was found for the distribution of its search cost. Although this recurrence could not be solved analytically, it allowed for numerical computations that, remarkably, suggested that the variance of the search cost approached a value of 1.883 when the table was full. Furthermore, by using a non-standard mean-centred search algorithm, this would imply that searches could be performed in expected constant time even in a full table. In spite of the time elapsed since these observations were made, no progress has been made in proving them. In this paper we introduce a technique to work around the intractability of the recurrence equation by solving instead an associated differential equation. While this does not provide an exact solution, it is sufficiently powerful to prove a bound of π2/3 for the variance, and thus obtain a proof that the variance of Robin Hood is bounded by a small constant for load factors arbitrarily close to 1. As a corollary, this proves that the mean-centred search algorithm runs in expected constant time. We also use this technique to study the performance of Robin Hood hash tables under a long sequence of insertions and deletions, where deletions are implemented by marking elements as deleted. We prove that, in this case, the variance is bounded by 1/(1−α), where α is the load factor. To model the behaviour of these hash tables, we use a unified approach that we apply also to study the First-Come-First-Served and Last-Come-First-Served collision resolution disciplines, both with and without deletions.

查看原文本刊更多论文

随机探测模型下罗宾汉和其他哈希算法的分析，有和没有删除

30年前，针对开放寻址哈希表引入了罗宾汉碰撞解决策略，并找到了其搜索代价分布的递归式。虽然这个递归式不能解析解决，但它允许数值计算，值得注意的是，当表满时，搜索成本的方差接近1.883。此外，通过使用非标准的均值中心搜索算法，这意味着即使在一个完整的表中，搜索也可以在预期的常数时间内执行。尽管自提出这些意见以来已经过了一段时间，但在证明这些意见方面没有取得任何进展。在本文中，我们介绍了一种技术，以解决递归方程的棘手，而不是解决一个相关的微分方程。虽然没有提供精确解，但它足以证明方差的π /3的界，从而得到罗宾汉的方差有一个小常数的界，当载荷因子任意接近1时。作为推论，这证明了均值中心搜索算法在预期的常数时间内运行。我们还使用这种技术来研究罗宾汉哈希表在长插入和删除序列下的性能，其中删除是通过将元素标记为已删除来实现的。我们证明，在这种情况下，方差以1/(1−α)为界，其中α是负载因子。为了模拟这些哈希表的行为，我们使用了一种统一的方法，我们也应用于研究先到先得和后到先得的冲突解决原则，包括删除和不删除。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Combinatorics, Probability and Computing

自引率

0.00%

发文量