用于内容过滤的快速可扩展模式匹配

2005 Symposium on Architectures for Networking and Communications Systems (ANCS) Pub Date : 2005-10-26 DOI:10.1145/1095890.1095916

Sarang Dharmapurikar, J. Lockwood

{"title":"用于内容过滤的快速可扩展模式匹配","authors":"Sarang Dharmapurikar, J. Lockwood","doi":"10.1145/1095890.1095916","DOIUrl":null,"url":null,"abstract":"High-speed packet content inspection and filtering devices rely on a fast multi-pattern matching algorithm which is used to detect predefined keywords or signatures in the packets. Multi-pattern matching is known to require intensive memory accesses and is often a performance bottleneck. Hence specialized hardware-accelerated algorithms are being developed for line-speed packet processing. While several pattern matching algorithms have already been developed for such applications, we find that most of them suffer from scalability issues. To support a large number of patterns, the throughput is compromised or vice versa. We present a hardware-implementable pattern matching algorithm for content filtering applications, which is scalable in terms of speed, the number of patterns and the pattern length. We modify the classic Aho-Corasick algorithm to consider multiple characters at a time for higher throughput. Furthermore, we suppress a large fraction of memory accesses by using Bloom filters implemented with a small amount of on-chip memory. The resulting algorithm can support matching of several thousands of patterns at more than 10 Gbps with the help of a less than 50 KBytes of embedded memory and a few megabytes of external SRAM. We demonstrate the merit of our algorithm through theoretical analysis and simulations performed on Snort's string set.","PeriodicalId":417086,"journal":{"name":"2005 Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"93","resultStr":"{\"title\":\"Fast and scalable pattern matching for content filtering\",\"authors\":\"Sarang Dharmapurikar, J. Lockwood\",\"doi\":\"10.1145/1095890.1095916\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High-speed packet content inspection and filtering devices rely on a fast multi-pattern matching algorithm which is used to detect predefined keywords or signatures in the packets. Multi-pattern matching is known to require intensive memory accesses and is often a performance bottleneck. Hence specialized hardware-accelerated algorithms are being developed for line-speed packet processing. While several pattern matching algorithms have already been developed for such applications, we find that most of them suffer from scalability issues. To support a large number of patterns, the throughput is compromised or vice versa. We present a hardware-implementable pattern matching algorithm for content filtering applications, which is scalable in terms of speed, the number of patterns and the pattern length. We modify the classic Aho-Corasick algorithm to consider multiple characters at a time for higher throughput. Furthermore, we suppress a large fraction of memory accesses by using Bloom filters implemented with a small amount of on-chip memory. The resulting algorithm can support matching of several thousands of patterns at more than 10 Gbps with the help of a less than 50 KBytes of embedded memory and a few megabytes of external SRAM. We demonstrate the merit of our algorithm through theoretical analysis and simulations performed on Snort's string set.\",\"PeriodicalId\":417086,\"journal\":{\"name\":\"2005 Symposium on Architectures for Networking and Communications Systems (ANCS)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"93\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2005 Symposium on Architectures for Networking and Communications Systems (ANCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1095890.1095916\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 Symposium on Architectures for Networking and Communications Systems (ANCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1095890.1095916","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 93

摘要

高速报文内容检测和过滤设备依靠快速多模式匹配算法来检测报文中的预定义关键字或签名。众所周知，多模式匹配需要大量内存访问，并且常常是性能瓶颈。因此，专门的硬件加速算法正在开发用于线速数据包处理。虽然已经为这类应用开发了几种模式匹配算法，但我们发现它们中的大多数都存在可伸缩性问题。为了支持大量模式，会降低吞吐量，反之亦然。我们提出了一种硬件可实现的内容过滤应用的模式匹配算法，该算法在速度、模式数量和模式长度方面具有可扩展性。我们修改了经典的Aho-Corasick算法，一次考虑多个字符以获得更高的吞吐量。此外，我们通过使用少量片上内存实现的Bloom过滤器来抑制大部分内存访问。所得到的算法可以在小于50kbytes的嵌入式存储器和几兆字节的外部SRAM的帮助下，以超过10gbps的速度支持数千个模式的匹配。我们通过对Snort的字符串集进行理论分析和模拟来证明我们算法的优点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Fast and scalable pattern matching for content filtering

High-speed packet content inspection and filtering devices rely on a fast multi-pattern matching algorithm which is used to detect predefined keywords or signatures in the packets. Multi-pattern matching is known to require intensive memory accesses and is often a performance bottleneck. Hence specialized hardware-accelerated algorithms are being developed for line-speed packet processing. While several pattern matching algorithms have already been developed for such applications, we find that most of them suffer from scalability issues. To support a large number of patterns, the throughput is compromised or vice versa. We present a hardware-implementable pattern matching algorithm for content filtering applications, which is scalable in terms of speed, the number of patterns and the pattern length. We modify the classic Aho-Corasick algorithm to consider multiple characters at a time for higher throughput. Furthermore, we suppress a large fraction of memory accesses by using Bloom filters implemented with a small amount of on-chip memory. The resulting algorithm can support matching of several thousands of patterns at more than 10 Gbps with the help of a less than 50 KBytes of embedded memory and a few megabytes of external SRAM. We demonstrate the merit of our algorithm through theoretical analysis and simulations performed on Snort's string set.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2005 Symposium on Architectures for Networking and Communications Systems (ANCS)

自引率

0.00%

发文量