A Multi-dimensional Progressive Perfect Hashing for High-Speed String Matching

Yang Xu, L. Ma, Zhaobo Liu, H. J. Chao
{"title":"A Multi-dimensional Progressive Perfect Hashing for High-Speed String Matching","authors":"Yang Xu, L. Ma, Zhaobo Liu, H. J. Chao","doi":"10.1109/ANCS.2011.33","DOIUrl":null,"url":null,"abstract":"Aho-Corasick (AC) automaton is widely used for multi-string matching in today's Network Intrusion Detection System (NIDS). With fast-growing rule sets, implementing AC automaton with a small memory without sacrificing its performance has remained challenging in NIDS design. In this paper, we propose a multi-dimensional progressive perfect hashing algorithm named P2-Hashing, which allows transitions of an AC automaton to be placed in a compact hash table without any collision. P2-Hashing is based on the observation that a hash key of each transition consists of two dimensions, namely a source state ID and an input character. When placing a transition in a hash table and causing a collision, we can change the value of a dimension of the hash key to rehash the transition to a new location of the hash table. For a given AC automaton, P2-Hashing first divides all the transitions into many small sets based on the two-dimensional values of the hash keys, and then places the sets of transitions progressively into the hash table until all are placed. Hash collisions that occurred during the insertion of a transition will only affect the transitions in the same set. The proposed P2-Hashing has many unique properties, including fast hash index generation and zero memory overhead, which are very suitable for the AC automaton operation. The feasibility and performance of P2-Hashing are investigated through simulations on the full Snort (6.4k rules) and Clam AV (54k rules) rule sets, each of which is first converted to a single AC automaton. Simulation results show that P2-Hashing can successfully construct the perfect hash table even when the load factor of the hash table is as high as 0.91.","PeriodicalId":124429,"journal":{"name":"2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ANCS.2011.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Aho-Corasick (AC) automaton is widely used for multi-string matching in today's Network Intrusion Detection System (NIDS). With fast-growing rule sets, implementing AC automaton with a small memory without sacrificing its performance has remained challenging in NIDS design. In this paper, we propose a multi-dimensional progressive perfect hashing algorithm named P2-Hashing, which allows transitions of an AC automaton to be placed in a compact hash table without any collision. P2-Hashing is based on the observation that a hash key of each transition consists of two dimensions, namely a source state ID and an input character. When placing a transition in a hash table and causing a collision, we can change the value of a dimension of the hash key to rehash the transition to a new location of the hash table. For a given AC automaton, P2-Hashing first divides all the transitions into many small sets based on the two-dimensional values of the hash keys, and then places the sets of transitions progressively into the hash table until all are placed. Hash collisions that occurred during the insertion of a transition will only affect the transitions in the same set. The proposed P2-Hashing has many unique properties, including fast hash index generation and zero memory overhead, which are very suitable for the AC automaton operation. The feasibility and performance of P2-Hashing are investigated through simulations on the full Snort (6.4k rules) and Clam AV (54k rules) rule sets, each of which is first converted to a single AC automaton. Simulation results show that P2-Hashing can successfully construct the perfect hash table even when the load factor of the hash table is as high as 0.91.
用于高速字符串匹配的多维渐进式完美哈希
在当今的网络入侵检测系统(NIDS)中,AC (Aho-Corasick)自动机被广泛用于多字符串匹配。对于快速增长的规则集,在不牺牲性能的情况下使用小内存实现交流自动机在NIDS设计中仍然具有挑战性。在本文中,我们提出了一种称为p2 -哈希的多维渐进式完美哈希算法,该算法允许将交流自动机的转换放置在紧凑的哈希表中而不会发生任何冲突。p2 -哈希基于这样的观察:每个转换的哈希键由两个维度组成,即源状态ID和输入字符。当在哈希表中放置转换并导致碰撞时,我们可以更改哈希键的一个维度的值,以将转换重新哈希到哈希表的新位置。对于给定的AC自动机,p2 -哈希首先根据哈希键的二维值将所有转换划分为许多小集,然后将转换集逐步放入哈希表中,直到所有转换集都被放置。在插入转换期间发生的哈希冲突将只影响同一集中的转换。本文提出的p2 -哈希算法具有快速生成哈希索引和零内存开销等独特的特性,非常适合于交流自动化操作。通过对完整Snort (6.4k规则)和Clam AV (54k规则)规则集的模拟,研究了p2 -哈希的可行性和性能,每个规则集首先转换为单个AC自动机。仿真结果表明,当哈希表的负载因子高达0.91时,p2 -哈希也能成功构建完美的哈希表。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信