Symposium on Architectures for Networking and Communications Systems最新文献_第5页

LaFA: lookahead finite automata for scalable regular expression detection 用于可扩展正则表达式检测的前瞻有限自动机

Symposium on Architectures for Networking and Communications Systems Pub Date : 2009-10-19 DOI: 10.1145/1882486.1882496

M. Bando, S. Artan, Jonathan Chao

{"title":"LaFA: lookahead finite automata for scalable regular expression detection","authors":"M. Bando, S. Artan, Jonathan Chao","doi":"10.1145/1882486.1882496","DOIUrl":"https://doi.org/10.1145/1882486.1882496","url":null,"abstract":"Although Regular Expressions (RegExes) have been widely used in network security applications, their inherent complexity often limits the total number of RegExes that can be detected using a single chip for a reasonable throughput. This limit on the number of RegExes impairs the scalability of today's RegEx detection systems. The scalability of existing schemes is generally limited by the traditional per character state processing and state transition detection paradigm. The main focus of existing schemes is in optimizing the number of states and the required transitions, but not the suboptimal character-based detection method. Furthermore, the potential benefits of reduced number of operations and states using out-of-sequence detection methods have not been explored. In this paper, we propose Looka-head Finite Automata (LaFA) to perform scalable RegEx detection using very small amount of memory. LaFA's memory requirement is very small due to the following three areas of effort described in this paper: (1) Different parts of a RegEx, namely RegEx components, are detected using different detectors, each of which is specialized and optimized for the detection of a certain RegEx component. (2) We systematically reorder the RegEx component detection sequence, which provides us with new possibilities for memory optimization. (3) Many redundant states in classical finite automata are identified and eliminated in LaFA. Our simulations show that LaFA requires an order of magnitude less memory compared to today's state-of-the-art RegEx detection systems. A single commodity Field Programmable Gate Array (FPGA) chip can accommodate up to twenty-five thousand (25k) RegExes. Based on the throughput of our LaFA prototype on FPGA, we estimated that a 34-Gbps throughput can be achieved.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124911447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

A lock-free, cache-efficient shared ring buffer for multi-core architectures 用于多核体系结构的无锁、缓存效率高的共享环缓冲区

Symposium on Architectures for Networking and Communications Systems Pub Date : 2009-10-19 DOI: 10.1145/1882486.1882508

P. Lee, T. Bu, Girish P. Chandranmenon

引用次数: 25

Memory optimization for packet classification algorithms 分组分类算法的内存优化

Symposium on Architectures for Networking and Communications Systems Pub Date : 2009-10-19 DOI: 10.1145/1882486.1882524

J. Blaho, J. Korenek, V. Pus

引用次数: 9

SPC-FA: synergic parallel compact finite automaton to accelerate multi-string matching with low memory SPC-FA:协同并行紧凑有限自动机，以加速多字符串匹配与低内存

Symposium on Architectures for Networking and Communications Systems Pub Date : 2009-10-19 DOI: 10.1145/1882486.1882523

Junchen Jiang, Yi Tang, B. Liu, Xiaofei Wang, Yang Xu

{"title":"SPC-FA: synergic parallel compact finite automaton to accelerate multi-string matching with low memory","authors":"Junchen Jiang, Yi Tang, B. Liu, Xiaofei Wang, Yang Xu","doi":"10.1145/1882486.1882523","DOIUrl":"https://doi.org/10.1145/1882486.1882523","url":null,"abstract":"Deterministic Finite Automaton (DFA) is well-known for its constant matching speed in worst case, and widely used in multi-string matching, which is a critical technique in high performance Network Intrusion Detection System (NIDS) design. Existing DFA-based researches achieve high throughput at the expense of extremely high memory cost, so they fail to be used in situations like embedded systems where very tight memory resource is available. In this paper, we propose a memory-efficient multi-string matching acceleration scheme named Synergic Parallel Compact (SPC) Match Engine, which can provide a high matching speedup with no extra memory cost than the traditional DFA. Our scheme can be understood as consisting of k SPC-FAs, each of which can process one character from the input stream, causing achieving a constant speedup factor k with reduced memory occupation. Experimental evaluations with Snort and ClamAV rulesets show that a speedup of 9X can be practically achieved by a single SPC Match Engine instance with a reduced memory size than the up-to-date DFA-based compression approaches.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125252059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Software techniques to improve virtualized I/O performance on multi-core systems 提高多核系统上虚拟I/O性能的软件技术

Symposium on Architectures for Networking and Communications Systems Pub Date : 2008-11-06 DOI: 10.1145/1477942.1477971

Guangdeng Liao, Danhua Guo, L. Bhuyan, Steve R. King

{"title":"Software techniques to improve virtualized I/O performance on multi-core systems","authors":"Guangdeng Liao, Danhua Guo, L. Bhuyan, Steve R. King","doi":"10.1145/1477942.1477971","DOIUrl":"https://doi.org/10.1145/1477942.1477971","url":null,"abstract":"Virtualization technology is now widely deployed on high performance networks such as 10-Gigabit Ethernet (10GE). It offers useful features like functional isolation, manageability and live migration. Unfortunately, the overhead of network I/O virtualization significantly degrades the performance of network-intensive applications. Two major factors of loss in I/O performance result from the extra driver domain to process I/O requests and the extra scheduler inside the virtual machine monitor (VMM) for scheduling domains.\u0000 In this paper we first examine the negative effect of virtualization in multi-core platforms with 10GE networking. We study virtualization overhead and develop two optimizations for the VMM scheduler to improve I/O performance. The first solution uses cache-aware scheduling to reduce inter-domain communication cost. The second solution steals scheduler credits to favor I/O VCPUs in the driver domain. We also propose two optimizations to improve packet processing in the driver domain. First we re-design a simple bridge for more efficient switching of packets. Second we develop a patch to make transmit (TX) queue length in the driver domain configurable and adaptable to 10GE networks. Using all the above techniques, our experiments show that virtualized I/O bandwidth can be increased by 96%. Our optimizations also improve the efficiency by saving 36% in core utilization per gigabit. All the optimizations are based on pure software approaches and do not hinder live migration. We believe that the findings from our study will be useful to guide future VMM development.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130674550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 79

Efficient regular expression evaluation: theory to practice 高效正则表达式求值:从理论到实践

Symposium on Architectures for Networking and Communications Systems Pub Date : 2008-11-06 DOI: 10.1145/1477942.1477950

M. Becchi, P. Crowley

引用次数: 147

Packet prediction for speculative cut-through switching 推测直通交换的包预测

Symposium on Architectures for Networking and Communications Systems Pub Date : 2008-11-06 DOI: 10.1145/1477942.1477957

Paul Congdon, M. Farrens, P. Mohapatra

{"title":"Packet prediction for speculative cut-through switching","authors":"Paul Congdon, M. Farrens, P. Mohapatra","doi":"10.1145/1477942.1477957","DOIUrl":"https://doi.org/10.1145/1477942.1477957","url":null,"abstract":"The amount of intelligent packet processing in an Ethernet switch continues to grow, in order to support of embedded applications such as network security, load balancing and quality of service assurance. This increased packet processing is contributing to greater per-packet latency through the switch.\u0000 In addition, there is a growing interest in using Ethernet switches in low latency environments such as high-performance clusters, storage area networks and real-time media distribution. In this paper we propose Packet Prediction for Speculative Cut-through Switching (PPSCS), a novel approach to reducing the latency of modern Ethernet switches without sacrificing feature rich policy-based forwarding enabled by deep packet inspection.\u0000 PPSCS exploits the temporal nature of network communications to predict the flow classification of incoming packets and begin the speculative forwarding of packets before complex lookup operations are complete.\u0000 Simulation studies using actual network traces indicate that correct prediction rates of up to 97% are achievable using only a small amount of prediction circuitry per port. These studies also indicate that PPSCS can reduce the latency in traditional store-and-forward switches by nearly a factor of 8, and reduce the latency of cut-through switches by a factor of 3.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132332975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Adaptive scheduling to maximize NIC throughput in a COTS router 在COTS路由器中实现网卡吞吐量最大化的自适应调度

Symposium on Architectures for Networking and Communications Systems Pub Date : 2008-11-06 DOI: 10.1145/1477942.1477960

Qinghua Ye, M. MacGregor

引用次数: 1

Compact architecture for high-throughput regular expression matching on FPGA 基于FPGA的高通量正则表达式匹配结构

Symposium on Architectures for Networking and Communications Systems Pub Date : 2008-11-06 DOI: 10.1145/1477942.1477948

Y. Yang, Weirong Jiang, V. Prasanna

{"title":"Compact architecture for high-throughput regular expression matching on FPGA","authors":"Y. Yang, Weirong Jiang, V. Prasanna","doi":"10.1145/1477942.1477948","DOIUrl":"https://doi.org/10.1145/1477942.1477948","url":null,"abstract":"In this paper we present a novel architecture for high-speed and high-capacity regular expression matching (REM) on FPGA. The proposed REM architecture, based on nondeterministic finite automaton (RE-NFA), efficiently constructs regular expression matching engines (REME) of arbitrary regular patterns and character classes in a uniform structure, utilizing both logic slices and block memory (BRAM) available on modern FPGA devices. The resulting circuits take advantage of synthesis and routing optimizations to achieve high operating speed and area efficiency. The uniform structure of our RE-NFA design can be stacked in a simple way to produce multi-character input circuits to scale up throughput further. An n-state m-character input REME takes only O (n X log2 m) time to construct and occupies no more than O (n X m) logic units. The REMEs can be staged and pipelined in large numbers to achieve high parallelism without sacrificing clock frequency.\u0000 Using the proposed RE-NFA architecture, we are able to implement 3 copies of two-character input REMEs, each with 760 regular expressions, 18715 states and 371 character classes, onto a single Xilinx Virtex 4 LX-100-12 device. Each copy processes 2 characters per clock cycle at 300 MHz, resulting in a concurrent throughput of 14.4 Gbps for 760 REMEs. Compared with the automatic NFA-to-VHDL REME compilation [13], our approach achieves over 9x throughput efficiency (Gbps*state/LUT). Compared with state-of-the-art REMEs on FPGA, our approach also indicates up to 70% better throughput efficiency.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122102973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 120

Low power architecture for high speed packet classification 用于高速分组分类的低功耗架构

Symposium on Architectures for Networking and Communications Systems Pub Date : 2008-11-06 DOI: 10.1145/1477942.1477967

A. Kennedy, Xiaojun Wang, Z. Liu, B. Liu

{"title":"Low power architecture for high speed packet classification","authors":"A. Kennedy, Xiaojun Wang, Z. Liu, B. Liu","doi":"10.1145/1477942.1477967","DOIUrl":"https://doi.org/10.1145/1477942.1477967","url":null,"abstract":"Today's routers need to perform packet classification at wire speed in order to provide critical services such as traffic billing, priority routing and blocking unwanted Internet traffic. With everincreasing ruleset size and line speed, the task of implementing wire speed packet classification with reduced power consumption remains difficult. Software approaches are unable to classify packets at wire speed as line rates reach OC-768, while state of the art hardware approaches such as TCAM still consume large amounts of power.\u0000 This paper presents a low power architecture for a high speed packet classifier which can meet OC-768 line rate. The architecture consists of an adaptive clocking unit which dynamically changes the clock speed of an energy efficient packet classifier to match fluctuations in traffic on a router line card. It achieves this with the help of a scheme developed to keep clock frequencies at the lowest speed capable of servicing the line card while reducing frequency switches. The low power architecture has been tested on OC-48, OC-192 and OC-768 packet traces created from real life network traces obtained from NLANR while classifying packets using synthetic rulesets containing up to 25,000 rules. Simulation results of our classifier implemented on a Cyclone 3 and Stratix 3 FPGA, and as an ASIC show that power savings of between 17--88% can be achieved, using our adaptive clocking unit rather than a fixed clock speed.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117123698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 66