{"title":"Ruler: high-speed packet matching and rewriting on NPUs","authors":"Tomás Hrubý, K. V. Reeuwijk, H. Bos","doi":"10.1145/1323548.1323550","DOIUrl":"https://doi.org/10.1145/1323548.1323550","url":null,"abstract":"Programming specialized network processors (NPU) is inherently difficult. Unlike mainstream processors where architectural features such as out-of-order execution and caches hide most of the complexities of efficient program execution, programmers of NPUs face a 'bare-metal' view of the architecture. They have to deal with a multithreaded environment with a high degree of parallelism, pipelining and multiple, heterogeneous, execution units and memory banks. Software development on such architectures is expensive. Moreover, different NPUs, even within the same family, differ considerably in their architecture, making portability of the software a major concern. At the same time expensive network processing applications based on deep packet inspection are both in-creasingly important and increasingly difficult to realize due to high link rates. They could potentially benefit greatly from the hardware features offered by NPUs, provided they were easy to use. We therefore propose to use more abstract programming models that hide much of the complexity of 'bare-metal' architectures from the programmer. In this paper, we present one such programming model: Ruler, a flexible high-level language for deep packet in-spection (DPI) and packet rewriting that is easy to learn, platform independent and lets the programmer concentrate on the functionality of the application. Ruler provides packet matching and rewrit-ing based on regular expressions. We describe our implementa-tion on the Intel IXP2xxx NPU and show how it provides versatile packet processing at gigabit line rates.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116390935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John Giacomoni, J. Bennett, Antonio Carzaniga, D. Sicker, Manish Vachharajani, A. Wolf
{"title":"Frame shared memory: line-rate networking on commodity hardware","authors":"John Giacomoni, J. Bennett, Antonio Carzaniga, D. Sicker, Manish Vachharajani, A. Wolf","doi":"10.1145/1323548.1323553","DOIUrl":"https://doi.org/10.1145/1323548.1323553","url":null,"abstract":"Network processors provide an economical programmable platform to handle the high throughput and frame rates of modern and next-generation communication systems. However, these platforms have exchanged general-purpose capabilities for performance.\u0000 This paper presents an alternative; a software network processor (Soft-NP) framework using commodity general-purpose platforms capable of high-rate and throughput sequential frame processing compatible with high-level languages and general-purpose operating systems. A cache-optimized concurrent lock free queue provides the necessary low-overhead core-to-core communication for sustained sequential frame processing beyond the realized 1.41 million frames per second (Gigabit Ethernet) while permitting perframe processing time expansion with pipeline parallelism.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116277441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of a network architecture with inherent data path security","authors":"T. Wolf","doi":"10.1145/1323548.1323556","DOIUrl":"https://doi.org/10.1145/1323548.1323556","url":null,"abstract":"Next-generation Internet architectures require designs with inherent security guarantees.We present a network architecture that uses credentials to audit traffic in the data path, where defenses can be employed often more quickly and efficiently than on end-systems.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129860655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An improved algorithm to accelerate regular expression evaluation","authors":"M. Becchi, P. Crowley","doi":"10.1145/1323548.1323573","DOIUrl":"https://doi.org/10.1145/1323548.1323573","url":null,"abstract":"Modern network intrusion detection systems need to perform regular expression matching at line rate in order to detect the occurrence of critical patterns in packet payloads. While deterministic finite automata (DFAs) allow this operation to be performed in linear time, they may exhibit prohibitive memory requirements. In [9], Kumar et al. propose Delayed Input DFAs (D2FAs), which provide a trade-off between the memory requirements of the compressed DFA and the number of states visited for each character processed, which corresponds directly to the memory bandwidth required to evaluate regular expressions.\u0000 In this paper we introduce a general compression technique that results in at most 2N state traversals when processing a string of length N. In comparison to the D2FA approach, our technique achieves comparable levels of compression, with lower provable bounds on memory bandwidth (or greater compression for a given bandwidth bound). Moreover, our proposed algorithm has lower complexity, is suitable for scenarios where a compressed DFA needs to be dynamically built or updated, and fosters locality in the traversal process. Finally, we also describe a novel alphabet reduction scheme for DFA-based structures that can yield further dramatic reductions in data structure size.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133209089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DPICO: a high speed deep packet inspection engine using compact finite automata","authors":"Christopher L. Hayes, Yan Luo","doi":"10.1145/1323548.1323579","DOIUrl":"https://doi.org/10.1145/1323548.1323579","url":null,"abstract":"Deep Packet Inspection (DPI)has been widely adopted in detecting network threats such as intrusion, viruses and spam. It is challenging, however, to achieve high speed DPI due to the expanding rule sets and ever increasing line rates. A key issue is that the size of the finite automata falls beyond the capacity of on-chip memory thus incurring expensive off-chip accesses. In this paper we present DPICO a hardware based DPI engine that utilizes novel techniques to minimize the storage requirements for finite automata. The techniques proposed are modified content addressable memory (mCAM), interleaved memory banks, and data packing. The experiment results show the scalable performance of DPICO can achieve up to 17.7 Gbps throughput using a contemporary FPGA chip. Experiment data also show that a DPICO based accelerator can improve the pattern matching performance of a DPI server by up to 10 times.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126916485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Compiling PCRE to FPGA for accelerating SNORT IDS","authors":"A. Mitra, W. Najjar, L. Bhuyan","doi":"10.1145/1323548.1323571","DOIUrl":"https://doi.org/10.1145/1323548.1323571","url":null,"abstract":"Deep Payload Inspection systems like SNORT and BRO utilize regular expression for their rules due to their high expressibility and compactness. The SNORT IDS system uses the PCRE Engine for regular expression matching on the payload. The software based PCRE Engine utilizes an NFA engine based on certain opcodes which are determined by the regular expression operators in a rule. Each rule in the SNORT ruleset is translated by PCRE compiler into an unique regular expression engine. Since the software based PCRE engine can match the payload with a single regular expression at a time, and needs to do so for multiple rules in the ruleset, the throughput of the SNORT IDS system dwindles as each packet is processed through a multitude of regular expressions.\u0000 In this paper we detail our implementation of hardware based regular expression engines for the SNORT IDS by transforming the PCRE opcodes generated by the PCRE compiler from SNORT regular expression rules. Our compiler generates VHDL code corresponding to the opcodes generated for the SNORT regular expression rules. We have tuned our hardware implementation to utilize an NFA based regular expression engine, using greedy quantifiers, in much the same way as the software based PCRE engine. Our system implements a regular expression only once for each new rule in the SNORT ruleset, thus resulting in a fast system that scales well with new updates. We implement two hundred PCRE engines based on a plethora of SNORT IDS rules, and use a Virtex-4 LX200 FPGA, on the SGI RASC RC 100 Blade connected to the SGI ALTIX 4700 supercomputing system as a testbed. We obtain an interface through-put of (12.9 GBits/s) and also a maximum speedup of 353X over software based PCRE execution.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127534318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimization of pattern matching algorithm for memory based architecture","authors":"Cheng-Hung Lin, Yunfang Tai, Shih-Chieh Chang","doi":"10.1145/1323548.1323551","DOIUrl":"https://doi.org/10.1145/1323548.1323551","url":null,"abstract":"Due to the advantages of easy re-configurability and scalability, the memory-based string matching architecture is widely adopted by network intrusion detection systems (NIDS). In order to accommodate the increasing number of attack patterns and meet the throughput requirement of networks, a successful NIDS system must have a memory-efficient pattern-matching algorithm and hardware design. In this paper, we propose a memory-efficient pattern-matching algorithm which can significantly reduce the memory requirement. For total Snort string patterns, the new algorithm achieves 29% of memory reduction compared with the traditional Aho-Corasick algorithm [5]. Moreover, since our approach is orthogonal to other memory reduction approaches, we can obtain substantial gain even after applying the existing state-of-the-art algorithms. For example, after applying the bit-split algorithm [9], we can still gain an additional 22% of memory reduction.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"412 21","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134289669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sheng-Ya Lin, Cheng-Chung Tan, Jyh-Charn S. Liu, Michael J. Oehler
{"title":"High-speed detection of unsolicited bulk emails","authors":"Sheng-Ya Lin, Cheng-Chung Tan, Jyh-Charn S. Liu, Michael J. Oehler","doi":"10.1145/1323548.1323577","DOIUrl":"https://doi.org/10.1145/1323548.1323577","url":null,"abstract":"We propose a Progressive Email Classifier (PEC) for high-speed classification of message patterns that are commonly associated with unsolicited bulk email (UNBE). PEC is designed to operate at the network access point, the ingress between the Internet Service Provider (ISP) and the enterprise network; so that a surge of UNBE containing fresh patterns can be detected before they spread into the enterprise network. A real-time scoreboard keeps track of detected feature instances (FI) based on a scoring and aging engine, until they are considered either from valid or UNBE sources. A FI of a valid email is discarded, but an anomalous one is passed to a blacklist to control (e.g., block or defer) subsequent emails containing the FI.\u0000 The anomaly detector of PEC can be used at different protocol layers. To gain some insights on the performance of PEC, we implemented PEC and integrated it with the sendmail daemon to detect anomalous URL links from email streams. Arbitrarily chosen on-line texts and URL links extracted from a corpus of spamming-phishing emails were used to compose testing emails. Experimental results on a Xeon based server show that PEC can handle 1.2M score/age updates, parse 0.9M URL links (of average size 30 bytes) for hashing and matching, and parsing of 25,000 email bodies of average size 1.5kB per second. The lossy detection system can be easily scaled by progressive selection of detection features and detection thresholds. It can be used alone or as an early screening tool for an existing infrastructure to defeat major UNBE flooding.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114853514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Experimenting with buffer sizes in routers","authors":"N. Beheshti, Jad Naous, Y. Ganjali, N. McKeown","doi":"10.1145/1323548.1323557","DOIUrl":"https://doi.org/10.1145/1323548.1323557","url":null,"abstract":"Recent theoretical results in buffer sizing research suggest that core Internet routers can achieve high link utilization, if they are capable of storing only a handful of packets. The underlying assumption is that the traffic is non-bursty, and that the system is operated below 85-90% utilization.\u0000 In this paper, we present a test-bed for buffer sizing experiments using NetFPGA [2], a PCI-form factor board that contains reprogrammable FPGA elements, and four Gigabit Ethernet interfaces. We have designed and implemented a NetFPGA-based Ethernet switch with finely tunable buffer sizes, and an event capturing system to monitor buffer occupancies inside the switch. We show that reducing buffer sizes down to 20-50 packets does not necessarily degrade system performance.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128170160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Frame-aggregated concurrent matching switch","authors":"Bill Lin, I. Keslassy","doi":"10.1145/1323548.1323568","DOIUrl":"https://doi.org/10.1145/1323548.1323568","url":null,"abstract":"Network operators need high-capacity router architectures that can offer scalability, provide throughput and performance guarantees, and maintain packet ordering. However, previous router architectures based on centralized crossbar-based architectures cannot scale to fast line rates and high port counts. Recently, a new scalable router architecture called the Concurrent Matching Switch (CMS)[5]was introduced that offers scalability by utilizing a fully distributed architecture based on two identical stages of fixed configuration meshes. It has been shown that fixed configuration meshes can be scaled to very fast line rates and highport counts via optical implementations.It has also been shown that the CMS architecture can achieve 100% through-put and packet ordering with only sequential hardware and O (1) amortized time complexity operations at each linecard. However, no delay performance guarantees have been shown for CMS.\u0000 In this paper, we demonstrate a general delay performance guarantee for CMS.Based on this guarantee, we propose a novel frame-based CMS architecture that can achieve a performance guarantee of O (N log N)average packet delay, where N is the number of switch ports, while retaining scalability,throughput guarantees, packet ordering, and O (1) time complexity. This architecture improves upon the best previously-known average delay bound of O (N 2)given these switch properties. We further introduce several alternative frame-based CMS architectures.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128763410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}