计算机网络中线性分布地址查找的最佳异或散列

2005 Symposium on Architectures for Networking and Communications Systems (ANCS) Pub Date : 2005-10-26 DOI:10.1145/1095890.1095919

Christopher J. Martinez, Wei-Ming Lin, P. Patel

{"title":"计算机网络中线性分布地址查找的最佳异或散列","authors":"Christopher J. Martinez, Wei-Ming Lin, P. Patel","doi":"10.1145/1095890.1095919","DOIUrl":null,"url":null,"abstract":"Hashing algorithms have been widely adopted to provide a fast address look-up process which involves a search through a large database to find a record associated with a given key. Modern examples include address-lookup in network routers for a forwarding outgoing link, rule-matching in intrusion detection systems comparing incoming packets with a large database, etc. Hashing algorithms involve transforming a key inside each target data to a hash value hoping that the hashing would render the database a uniform distribution with respect to this new hash value. When the database are already key-wise uniformly distributed, any regular hashing algorithm would easily lead to perfectly uniform distribution after the hashing. On the other hand, if records in the database are instead not uniformly distributed, then different hashing functions would lead to different performance. This paper addresses the case when such distribution follows a natural negative linear distribution, which is found to approximate distributions in many various applications. For this distribution, we derive a general formula for calculating the distribution variance produced by any given non-overlapped bit-grouping XOR hashing function. Such a distribution variance from the hashing directly translates to performance variations in searching. In this paper, the best XOR hashing function is determined for any given key size and any given hashing target size.","PeriodicalId":417086,"journal":{"name":"2005 Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"183 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Optimal XOR hashing for a linearly distributed address lookup in computer networks\",\"authors\":\"Christopher J. Martinez, Wei-Ming Lin, P. Patel\",\"doi\":\"10.1145/1095890.1095919\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hashing algorithms have been widely adopted to provide a fast address look-up process which involves a search through a large database to find a record associated with a given key. Modern examples include address-lookup in network routers for a forwarding outgoing link, rule-matching in intrusion detection systems comparing incoming packets with a large database, etc. Hashing algorithms involve transforming a key inside each target data to a hash value hoping that the hashing would render the database a uniform distribution with respect to this new hash value. When the database are already key-wise uniformly distributed, any regular hashing algorithm would easily lead to perfectly uniform distribution after the hashing. On the other hand, if records in the database are instead not uniformly distributed, then different hashing functions would lead to different performance. This paper addresses the case when such distribution follows a natural negative linear distribution, which is found to approximate distributions in many various applications. For this distribution, we derive a general formula for calculating the distribution variance produced by any given non-overlapped bit-grouping XOR hashing function. Such a distribution variance from the hashing directly translates to performance variations in searching. In this paper, the best XOR hashing function is determined for any given key size and any given hashing target size.\",\"PeriodicalId\":417086,\"journal\":{\"name\":\"2005 Symposium on Architectures for Networking and Communications Systems (ANCS)\",\"volume\":\"183 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2005 Symposium on Architectures for Networking and Communications Systems (ANCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1095890.1095919\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 Symposium on Architectures for Networking and Communications Systems (ANCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1095890.1095919","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

散列算法已被广泛采用，以提供快速的地址查找过程，该过程涉及在大型数据库中搜索以找到与给定键相关的记录。现代的例子包括网络路由器中用于转发出站链路的地址查找，入侵检测系统中将传入数据包与大型数据库进行比较的规则匹配等。散列算法涉及将每个目标数据中的一个键转换为一个散列值，希望该散列将使数据库相对于这个新的散列值呈现一个统一的分布。当数据库已经是键均匀分布时，任何常规的散列算法在散列之后都很容易导致完全均匀的分布。另一方面，如果数据库中的记录不是均匀分布的，那么不同的哈希函数将导致不同的性能。本文讨论了这种分布遵循自然负线性分布的情况，这种分布在许多不同的应用中被发现可以近似分布。对于这种分布，我们推导了一个通用公式，用于计算任何给定的非重叠位分组异或散列函数产生的分布方差。来自散列的这种分布差异直接转化为搜索中的性能差异。本文对任意给定的键大小和任意给定的哈希目标大小确定最佳异或哈希函数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optimal XOR hashing for a linearly distributed address lookup in computer networks

Hashing algorithms have been widely adopted to provide a fast address look-up process which involves a search through a large database to find a record associated with a given key. Modern examples include address-lookup in network routers for a forwarding outgoing link, rule-matching in intrusion detection systems comparing incoming packets with a large database, etc. Hashing algorithms involve transforming a key inside each target data to a hash value hoping that the hashing would render the database a uniform distribution with respect to this new hash value. When the database are already key-wise uniformly distributed, any regular hashing algorithm would easily lead to perfectly uniform distribution after the hashing. On the other hand, if records in the database are instead not uniformly distributed, then different hashing functions would lead to different performance. This paper addresses the case when such distribution follows a natural negative linear distribution, which is found to approximate distributions in many various applications. For this distribution, we derive a general formula for calculating the distribution variance produced by any given non-overlapped bit-grouping XOR hashing function. Such a distribution variance from the hashing directly translates to performance variations in searching. In this paper, the best XOR hashing function is determined for any given key size and any given hashing target size.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2005 Symposium on Architectures for Networking and Communications Systems (ANCS)

自引率

0.00%

发文量