Jeffrey Shafer, Brent E. Stephens, Michael Foss, S. Rixner, A. Cox
{"title":"Axon: A flexible substrate for source-routed Ethernet","authors":"Jeffrey Shafer, Brent E. Stephens, Michael Foss, S. Rixner, A. Cox","doi":"10.1145/1872007.1872035","DOIUrl":"https://doi.org/10.1145/1872007.1872035","url":null,"abstract":"This paper introduces the Axon, an Ethernet-compatible device for creating large-scale datacenter networks. Axons are inexpensive, practical devices that are demonstrated using prototype hardware. Functionally, Axons replace Ethernet switches and maintain full compatibility with existing Ethernet hosts. Between themselves, however, Axons transparently use source-routed Ethernet. This unlocks many benefits, such as improved network scalability, performance, and flexibility. In an Axon network, all state required to route a host's packets is placed in the local Axon-the Axon to which the host is directly connected. Therefore, regardless of the scale of the network, the route computation and storage needs of a single Axon device only need to scale with the demands of its locally-connected hosts. This is in stark contrast to conventional switched Ethernet, which requires routing resources proportional to the traffic that flows through the device. Scalability is also increased by eliminating the use of packet flooding for automatic location and address discovery. Further, source-routed Ethernet increases network flexibility by supporting different route selection strategies. For example, shortest-path routing could be employed, or longer paths selected to minimize congestion by balancing traffic across redundant links.","PeriodicalId":262685,"journal":{"name":"2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129171801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy-aware routing in hybrid optical network-on-chip for future Multi-Processor System-on-Chip","authors":"Lin Liu, Yuanyuan Yang","doi":"10.1145/1872007.1872029","DOIUrl":"https://doi.org/10.1145/1872007.1872029","url":null,"abstract":"With the development of Multi-Processor System-on-Chip (MP-SoC) in recent years, the intra-chip communication is becoming the bottleneck of the whole system. Current electronic network-on-chip (NoC) designs face serious challenges, such as bandwidth, latency and power consumption. Optical interconnection networks are a promising technology to overcome these problems. In this paper, we study the routing problem in optical NoCs with arbitrary network topologies. Traditionally, a minimum hop count routing policy is employed for electronic NoCs, as it minimizes both power consumption and latency. However, due to the special architecture of current optical NoC routers , such a minimum-hop path may not be energy-wise optimal. Using a detailed model of optical routers we reduce the energy-aware routing problem into a shortest-path problem, which can then be solved using one of the many well known techniques. By applying our approach to different popular topologies, we show that the energy consumed in data communication in an optical NoC can be significantly reduced. We also propose the use of optical burst switching (OBS) in optical NoCs to reduce control overhead, as well as an adaptive routing mechanism to reduce energy consumption without introducing extra latency. Our simulation results demonstrate the effectiveness of the proposed algorithms.","PeriodicalId":262685,"journal":{"name":"2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121047807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Software-based implementations of updateable data structures for high-speed URL matching","authors":"Haowei Yuan, Ben Wun, P. Crowley","doi":"10.1145/1872007.1872025","DOIUrl":"https://doi.org/10.1145/1872007.1872025","url":null,"abstract":"URL matching is used in many network applications, including URL blacklisting, URL-based forwarding and URL shortening services. These applications need fast URL queries and updates, thus requiring an efficient updateable data structure. As the processing power of general-purpose multi-core processors increases, software-based approaches are better able to meet the speed requirements of URL matching. In this paper, we present our preliminary performance study of finite-automata- and hash-based URL matching implementations on commodity PCs. The impacts of the cache and memory allocation methods are discussed.","PeriodicalId":262685,"journal":{"name":"2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116167273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Bando, N. S. Artan, Rihua Wei, Xiang-Yu Guo, H. J. Chao
{"title":"Range hash for regular expression pre-filtering","authors":"M. Bando, N. S. Artan, Rihua Wei, Xiang-Yu Guo, H. J. Chao","doi":"10.1145/1872007.1872032","DOIUrl":"https://doi.org/10.1145/1872007.1872032","url":null,"abstract":"Recently, major Internet carriers and vendors successfully tested high-speed backbone networks at 100-Gbps line speed to support rapid growth of the Internet traffic demands. In addition, traffic is getting more concentrated to points such as data centers, and demand for protecting such high-speed networks from attack traffic is increasing. Deep Packet Inspection (DPI) with Regular Expression (RegEx) detection is the de facto defense mechanism agains network intrusions. However, current RegEx detection systems cannot keep up with the upcoming high-speed line rate. The RegExes consist of three types of components, exact strings, character classes (CC), and repetitions. Exact string and repetition matching have been widely studied by RegEx research community for better performance. Yet, although more than 55% of RegExes in Snort signature set contain at least one CC, hardware based solutions that focus on CC detection is limited. In this paper we propose a new CC detection architecture called Range Hash that is suitable for high-speed, compact CC detection. Additionally, we propose a practical application of the Range Hash architecture where it can be used as a pre-filter for a Regular Expression detection system to increase overall RegEx detection performance. Based on our hardware prototype design which runs at 250MHz, Range Hash can reach to 100-Gbps CC detection throughput with today's FPGA chips.","PeriodicalId":262685,"journal":{"name":"2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126036722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chimpp: A Click-based programming and simulation environment for reconfigurable networking hardware","authors":"Erik Rubow, R. McGeer, J. Mogul, Amin Vahdat","doi":"10.1145/1872007.1872052","DOIUrl":"https://doi.org/10.1145/1872007.1872052","url":null,"abstract":"Reconfigurable network hardware makes it easier to experiment with and prototype high-speed networking systems. However, these devices are still relatively hard to program; for example, requiring users to develop in Verilog or VHDL. Further, these devices are commonly designed to work with software on a host computer, requiring the co-development of these hardware and software components. We address this situation with Chimpp, a development environment for reconfigurable network hardware, modeled on the popular Click modular router system. Chimpp employs a modular approach to designing hardware-based packet-processing systems, featuring a simple configuration language similar to that of Click. We demonstrate this development environment by targeting the NetFPGA platform. Chimpp can be combined with Click itself at the software layer for a highly modular, mixed hardware and software design framework. We also enable the integrated simulation of the hardware and software components of a network device together with other network devices using the OMNeT++ network simulator. The goal of Chimpp is to make experimentation easy by providing a toolbox of reusable, modular elements and a way to easily combine them. In contrast with some prior work, Chimpp avoids unnecessary restrictions on module interfaces and design styles. Rather, it is easy to add custom interfaces and to incorporate existing hardware modules. We describe our design and implementation of Chimpp, and provide initial evaluations showing how Chimpp makes it easier to implement, simulate, and modify a variety of packet-processing systems on the NetFPGA platform.","PeriodicalId":262685,"journal":{"name":"2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125553278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fair multithreading on packet processors for scalable network virtualization","authors":"Qiang Wu, S. Shanbhag, T. Wolf","doi":"10.1145/1872007.1872009","DOIUrl":"https://doi.org/10.1145/1872007.1872009","url":null,"abstract":"Network virtualization requires careful control of networking resources, including link bandwidth, router memory, and packet processing time. Isolation and fair sharing of processing resources in current high-performance packet processors occur at the granularity of entire processor cores. Scaling of network virtualization to larger numbers of parallel slices requires a more fine-grained processor sharing mechanism. Our work presents a novel approach, called Fair Multithreading (FMT), that allows hardware threads to share a processor core while ensuring isolation and weighted fair access. We present an analysis of the FMT algorithm and a prototype implementation on a NetFPGA system. Our evaluation results indicate that FMT can be implemented at speeds that are necessary to make scheduling decisions at the instruction level. We show the impact of having such fine-grained processor schedulers in substrate nodes by comparing the resource utilization of virtual network slices in our system to traditional whole-core allocations. Our simulation results show the FMT-based substrate networks can be utilized more efficiently and more virtual network requests can be accommodated. These results indicate the significant improvement in system scalability that can be gained from our fine-grained processor scheduling system.","PeriodicalId":262685,"journal":{"name":"2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130461681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient packet classification algorithm based on entropy","authors":"Michal Kajan, J. Korenek","doi":"10.1145/1872007.1872021","DOIUrl":"https://doi.org/10.1145/1872007.1872021","url":null,"abstract":"This paper deals with packet classification in high-speed networks. It introduces a novel method for packet classification based on the amount of information stored in the ruleset. Basic principles of the algorithm based on the effort to reduce the amount of the necessary memory space and number of computational steps are presented together with analysis of the input rulesets.","PeriodicalId":262685,"journal":{"name":"2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126862339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaohui Ye, Yawei Yin, S. Yoo, P. Mejia, R. Proietti, V. Akella
{"title":"DOS - A scalable optical switch for datacenters","authors":"Xiaohui Ye, Yawei Yin, S. Yoo, P. Mejia, R. Proietti, V. Akella","doi":"10.1145/1872007.1872037","DOIUrl":"https://doi.org/10.1145/1872007.1872037","url":null,"abstract":"This paper discusses the architecture and performance studies of Datacenter Optical Switch (DOS) designed for scalable and high-throughput interconnections within a data center. DOS exploits wavelength routing characteristics of a switch fabric based on an Arrayed Waveguide Grating Router (AWGR) that allows contention resolution in the wavelength domain. Simulation results indicate that DOS exhibits lower latency and higher throughput even at high input loads compared with electronic switches or previously proposed optical switch architectures such as OSMOSIS [4, 5] and Data Vortex [6, 7]. Such characteristics, together with very high port count on a single switch fabric make DOS attractive for data center applications where the traffic patterns are known to be bursty with high temporary peaks [13]. DOS exploits the unique characteristics of the AWGR fabric to reduce the delay and complexity of arbitration. We present a detailed analysis of DOS using a cycle-accurate network simulator. The results show that the latency of DOS is almost independent of the number of input ports and does not saturate even at very high (approx 90%) input load. Furthermore, we show that even with 2 to 4 wavelengths, the performance of DOS is significantly better than an electrical switch network based on state-of-the-art flattened butterfly topology.","PeriodicalId":262685,"journal":{"name":"2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115079343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bit-shuffled trie: A new approach for IP address lookup","authors":"D. Pao, Ziyan Lu","doi":"10.1145/1872007.1872020","DOIUrl":"https://doi.org/10.1145/1872007.1872020","url":null,"abstract":"IP address lookup is a fundamental operation in packet forwarding. Using multi-level index tables to find out the next-hop value is an attractive approach due to its simplicity. However, memory efficiency is relatively low because prefixes are sparsely distributed in the address space. In this poster, we shall outline a new approach to construct memory efficient index tables based on a technique called bit-shuffling. The proposed method is evaluated using a real-life IPv4 routing table with 321K prefixes. The lookup tables occupy 0.8MB memory.","PeriodicalId":262685,"journal":{"name":"2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134030747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A folded pipeline network processor architecture for 100 Gbit/s networks","authors":"Kimon Karras, Thomas Wild, A. Herkersdorf","doi":"10.1145/1872007.1872010","DOIUrl":"https://doi.org/10.1145/1872007.1872010","url":null,"abstract":"Ethernet, although initially conceived as a Local Area Network technology, has been steadily making inroads into access and core networks. This has led to a need for higher link speeds, which are now reaching 100 Gbit/s. Packet processing at this rate represents a significant challenge, that needs to be met efficiently, while minimizing power consumption and chip area. This level of throughput favours a pipelined approach, thus this paper takes a traditional pipeline and breaks it down to mini-pipelines, which can perform coarse-grained processing (like process an MPLS label to completion). These mini-pipelines are then parallelized and used to construct a folded pipeline architecture, which augments the traditional approach by significantly reducing power consumption, a key problem in future routers. The paper compares the two approaches, discusses their advantages and disadvantages and demonstrates by quantitative measures that the folded pipeline architecture is the better solution for 100 Gbit/s processing.","PeriodicalId":262685,"journal":{"name":"2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"37 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121166243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}