LaFA: lookahead finite automata for scalable regular expression detection

M. Bando, S. Artan, Jonathan Chao
{"title":"LaFA: lookahead finite automata for scalable regular expression detection","authors":"M. Bando, S. Artan, Jonathan Chao","doi":"10.1145/1882486.1882496","DOIUrl":null,"url":null,"abstract":"Although Regular Expressions (RegExes) have been widely used in network security applications, their inherent complexity often limits the total number of RegExes that can be detected using a single chip for a reasonable throughput. This limit on the number of RegExes impairs the scalability of today's RegEx detection systems. The scalability of existing schemes is generally limited by the traditional per character state processing and state transition detection paradigm. The main focus of existing schemes is in optimizing the number of states and the required transitions, but not the suboptimal character-based detection method. Furthermore, the potential benefits of reduced number of operations and states using out-of-sequence detection methods have not been explored. In this paper, we propose Looka-head Finite Automata (LaFA) to perform scalable RegEx detection using very small amount of memory. LaFA's memory requirement is very small due to the following three areas of effort described in this paper: (1) Different parts of a RegEx, namely RegEx components, are detected using different detectors, each of which is specialized and optimized for the detection of a certain RegEx component. (2) We systematically reorder the RegEx component detection sequence, which provides us with new possibilities for memory optimization. (3) Many redundant states in classical finite automata are identified and eliminated in LaFA. Our simulations show that LaFA requires an order of magnitude less memory compared to today's state-of-the-art RegEx detection systems. A single commodity Field Programmable Gate Array (FPGA) chip can accommodate up to twenty-five thousand (25k) RegExes. Based on the throughput of our LaFA prototype on FPGA, we estimated that a 34-Gbps throughput can be achieved.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Symposium on Architectures for Networking and Communications Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1882486.1882496","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

Although Regular Expressions (RegExes) have been widely used in network security applications, their inherent complexity often limits the total number of RegExes that can be detected using a single chip for a reasonable throughput. This limit on the number of RegExes impairs the scalability of today's RegEx detection systems. The scalability of existing schemes is generally limited by the traditional per character state processing and state transition detection paradigm. The main focus of existing schemes is in optimizing the number of states and the required transitions, but not the suboptimal character-based detection method. Furthermore, the potential benefits of reduced number of operations and states using out-of-sequence detection methods have not been explored. In this paper, we propose Looka-head Finite Automata (LaFA) to perform scalable RegEx detection using very small amount of memory. LaFA's memory requirement is very small due to the following three areas of effort described in this paper: (1) Different parts of a RegEx, namely RegEx components, are detected using different detectors, each of which is specialized and optimized for the detection of a certain RegEx component. (2) We systematically reorder the RegEx component detection sequence, which provides us with new possibilities for memory optimization. (3) Many redundant states in classical finite automata are identified and eliminated in LaFA. Our simulations show that LaFA requires an order of magnitude less memory compared to today's state-of-the-art RegEx detection systems. A single commodity Field Programmable Gate Array (FPGA) chip can accommodate up to twenty-five thousand (25k) RegExes. Based on the throughput of our LaFA prototype on FPGA, we estimated that a 34-Gbps throughput can be achieved.
用于可扩展正则表达式检测的前瞻有限自动机
尽管正则表达式(RegExes)已广泛用于网络安全应用程序,但其固有的复杂性通常限制了使用单个芯片可以检测到的RegExes的总数,以达到合理的吞吐量。这种对RegEx数量的限制削弱了当今RegEx检测系统的可伸缩性。现有方案的可扩展性通常受到传统的逐字符状态处理和状态转移检测范式的限制。现有方案的主要重点是优化状态的数量和所需的转换,而不是次优的基于字符的检测方法。此外,使用无序检测方法减少操作和状态数量的潜在好处尚未得到探讨。在本文中,我们提出了Looka-head Finite Automata (LaFA)来使用非常少的内存执行可扩展的RegEx检测。由于本文描述了以下三个方面的努力,LaFA的内存需求非常小:(1)使用不同的检测器检测RegEx的不同部分,即RegEx组件,每个检测器都是专门针对某个RegEx组件进行检测和优化的。(2)系统地重新排序了RegEx分量检测序列,为内存优化提供了新的可能性。(3)经典有限自动机的冗余状态在LaFA中被识别和消除。我们的模拟表明,与当今最先进的RegEx检测系统相比,LaFA需要的内存少了一个数量级。单个商品现场可编程门阵列(FPGA)芯片可以容纳多达25,000 (25k) regex。基于我们在FPGA上的LaFA原型的吞吐量,我们估计可以实现34 gbps的吞吐量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信