M. Marcon, Nuno Santos, K. Gummadi, Nikolaos Laoutaris, P. Rodriguez, Amin Vahdat
{"title":"NetEx: Efficient and cost-effective internet bulk content delivery","authors":"M. Marcon, Nuno Santos, K. Gummadi, Nikolaos Laoutaris, P. Rodriguez, Amin Vahdat","doi":"10.1145/1872007.1872045","DOIUrl":"https://doi.org/10.1145/1872007.1872045","url":null,"abstract":"The Internet is witnessing explosive growth in traffic due to bulk content transfers, such as multimedia and software downloads, and online sharing of personal, commercial, and scientific data. Yet bulk data transfers remain very expensive and inefficient. As a result, huge amounts of digital data continue to be delivered outside of the Internet using hard drives, optical media or tapes. Meanwhile, large reserves of spare bandwidth lie unutilized in today's networks, where links are overprovisioned for peak load. We designed NetEx, a bulk transfer system that opportunistically exploits the excess capacities of network links to deliver bulk content cheaply and efficiently. Our results based on data from both a commercial tier-1 ISP and the Abilene network suggest that NetEx can considerably increase the capacity of the network, and at the same time it can provide good average performance to bulk transfers.","PeriodicalId":262685,"journal":{"name":"2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124996508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving PC-based OpenFlow switching performance","authors":"Voravit Tanyingyong, M. Hidell, Peter Sjödin","doi":"10.1145/1872007.1872023","DOIUrl":"https://doi.org/10.1145/1872007.1872023","url":null,"abstract":"In this paper, we propose an architectural design to improve lookup performance of OpenFlow switching in Linux using a standard commodity network interface card based on the Intel 82599 Gigabit Ethernet controller. We describe our design and report our preliminary results that show packet switching throughput increasing up to 25 percent compared to the throughput of regular software-based OpenFlow switching.","PeriodicalId":262685,"journal":{"name":"2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115259048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Load balancing packets on a tile-based massive multi-core processor with S-NUCA","authors":"E. Musoll","doi":"10.1145/1872007.1872047","DOIUrl":"https://doi.org/10.1145/1872007.1872047","url":null,"abstract":"In massive tile-based multi-core architectures, it is important that the execution of the packets of a particular flow takes place in a set of cores physically close to each other in order to minimize the average latency to the common data structures across the local caches of the different cores. An static NUCA implementation provides a substrate for a cost-effective implementation of a cache sharing mechanism. However, a careful mapping of the different data structures in the system's memory, along with a smart load-balancing mechanism of the packets to the different cores, is fundamental in order to avoid long latencies to remote data. This work proposes a methodology for load balancing packets to cores in an S-NUCA tile-based architecture with a large number of cores.","PeriodicalId":262685,"journal":{"name":"2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122144800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Wang, Poornachandran Kumar, R. Boyapati, K. H. Yum, Eun Jung Kim
{"title":"Efficient lookahead routing and header compression for multicasting in networks-on-chip","authors":"Lei Wang, Poornachandran Kumar, R. Boyapati, K. H. Yum, Eun Jung Kim","doi":"10.1145/1872007.1872028","DOIUrl":"https://doi.org/10.1145/1872007.1872028","url":null,"abstract":"As technology advanced, ChipMulti-processor (CMP) architectures have emerged as a viable solution for designing processors. Networks-on-Chip (NOCs) provide a scalable communication method for CMP architectures as the number of cores is increasing. Although there has been significant research on NOC designs for unicast traffic, the research on the multicast router design is still in infancy stage. Considering that one-to-many (multicast) and one-to-all (broadcast) traffic are more common in CMP applications, it is important to design a router providing efficient multicasting. In this paper, we propose an efficient lookahead routing with limited area overhead for a recently proposed multicast routing algorithm, Recursive Partitioning Multicast (RPM) [17]. Also, we present a novel compression scheme for a multicast packet header that becomes a big overhead in large networks. Comprehensive simulation results show that with our route computation logic design, providing lookahead routing in the multicast router only costs less than 20% area overhead and this percentage keeps decreasing with larger network sizes. Compared with the basic lookahead routing design, our design can save area by over 50%. With header compression and lookahead multicast routing, the network performance is improved by 22% in a (16 × 16) network on average.","PeriodicalId":262685,"journal":{"name":"2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123209101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evolution of cache replacement policies to track heavy-hitter flows","authors":"M. Zádník, M. Canini","doi":"10.1145/1872007.1872046","DOIUrl":"https://doi.org/10.1145/1872007.1872046","url":null,"abstract":"Flow-based network traffic processing, that is, processing packets based on some state information associated to the flows to which the packets belong, is a key enabler for a variety of network services and applications. This form of stateful traffic processing is used in modern switches [1] and routers that contain flow tables to implement forwarding, firewalls, NAT, QoS, and collect measurements.","PeriodicalId":262685,"journal":{"name":"2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130959700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Packet scheduling for deep packet inspection on multi-core architectures","authors":"T. Nelms, M. Ahamad","doi":"10.1145/1872007.1872033","DOIUrl":"https://doi.org/10.1145/1872007.1872033","url":null,"abstract":"Multi-core architectures are commonly used for network applications because the workload is highly parallelizable. Packet scheduling is a critical performance component of these applications and significantly impacts how well they scale. Deep packet inspection (DPI) applications are more complex than most network applications. This makes packet scheduling more difficult, but it can have a larger impact on performance. Also, packet latency and ordering requirements differ depending on whether the DPI application is deployed inline. Therefore, different packet scheduling tradeoffs can be made based on the deployment. In this paper, we evaluate three packet scheduling algorithms with the Protocol Analysis Module (PAM) as our DPI application using network traces acquired from production networks where intrusion prevention systems (IPS) are deployed. One of the packet scheduling algorithms we evaluate is commonly used in production applications; thus, it is useful for comparison. The other two are of our own design. Our results show that packet scheduling based on cache affinity is more important than trying to balance packets. More specifically, for the three network traces we tested, our cache affinity packet scheduler outperformed the other two schedulers increasing throughput by as much as 38%.","PeriodicalId":262685,"journal":{"name":"2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128851090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TED: Tool for testing and debugging uDAPL","authors":"Eva Mishra, Yogeshwar Sonawane","doi":"10.1145/1872007.1872049","DOIUrl":"https://doi.org/10.1145/1872007.1872049","url":null,"abstract":"User Direct Access Programming Library (uDAPL) defines a single set of user APIs for Remote Direct Memory Access (RDMA) capable transports. Developers of uDAPL have to write test programs for verification of APIs and for integrated testing of software stack along with underlying hardware. The tools available for testing uDAPL suffer from the following limitations: they do not provide control at API level, offer very little control of input parameters of APIs and provide limited flexibility vis-à-vis test cases that can be executed. This paper describes a new tool `Test Environment for DAPL' (TED) that enables integrated testing and debugging of software stack and underlying hardware while providing more flexibility and control to user. It can be used over any implementation of uDAPL and is available as open source. In addition, this paper proposes a novel approach for flow control of RDMA operations. Since in RDMA operations responder side does not receive any completion, mechanisms generally rely on last byte of data buffer for notification of arrival of data. This scheme can fail if underlying transport does not ensure that data arrives in order. The proposed design ensures validity even over networks that do not guarantee in order arrival of data.","PeriodicalId":262685,"journal":{"name":"2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127862385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast regular expression matching in hardware using NFA-BDD combination","authors":"D. Chasaki, T. Wolf","doi":"10.1145/1872007.1872022","DOIUrl":"https://doi.org/10.1145/1872007.1872022","url":null,"abstract":"The development of Network Intrusion Detection Systems (NIDS) is nowadays a powerful solution to defend against various network security threats. There has been a lot of research effort devoted to hardware-based NIDS, because of (1) the massive amount of computation performed by regular expression matching algorithms and (2) the gigabit per second performance requirement of modern NIDS. Hardware-based NIDS take advantage of parallelization inherent in FPGAs, ASICs or network processors to support very high network speeds, while software approaches fail to do so.","PeriodicalId":262685,"journal":{"name":"2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131898877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Galois field hardware architectures for network coding","authors":"Aishwarya Nagarajan, M. Schulte, P. Ramanathan","doi":"10.1145/1872007.1872051","DOIUrl":"https://doi.org/10.1145/1872007.1872051","url":null,"abstract":"This paper presents and analyzes novel hardware designs for high-speed network coding. Our designs provide efficient methods to perform Galois field (GF) dot products and matrix inversions, which are important operations in network coding. Encoder designs that perform GF dot products and vary with respect to the number of messages combined, Galois field size, and input message size are implemented and analyzed to evaluate design tradeoffs. We investigate single cycle, multicycle, and pipelined designs with and without feedback mechanisms for encoding multiple sets of messages. The decoder is implemented as a multi-cycle design and performs GF matrix inversion followed by multiple GF dot products. Our designs are synthesized with a 65nm standard cell library and compared in terms of area, critical path delay, and throughput. Designs combining four messages achieve throughputs of more than 30 Gbps. Our designs can scale to achieve much higher throughput through the use of additional hardware.","PeriodicalId":262685,"journal":{"name":"2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128268059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guangdeng Liao, L. Bhuyan, Wei Wu, Heeyeol Yu, Steve R. King
{"title":"A new TCB cache to efficiently manage TCP sessions for web servers","authors":"Guangdeng Liao, L. Bhuyan, Wei Wu, Heeyeol Yu, Steve R. King","doi":"10.1145/1872007.1872039","DOIUrl":"https://doi.org/10.1145/1872007.1872039","url":null,"abstract":"TCP/IP, the most commonly used network protocol, consumes a significant portion of time in Internet servers. While a wide spectrum of studies has been done to reduce its processing overhead such as TOE and Direct Cache Access, most of them did studies solely from the per-packet perspective and concentrated on the packet memory access overhead. They ignored per-session data TCP Control Block (TCB), which poses a challenge in web servers with a large volume of concurrent sessions. In this paper, we start with challenge studies and show that the TCB data should be efficiently managed. We propose a new TCB cache addressed by session identifiers to address the challenge. We carefully design the TCB cache along two important axes: cache indexing and cache replacement policies. First, we study the performance of various hash functions and propose a new indexing scheme for the TCB cache by employing two Universal hash functions. We analyze session identifiers and choose some important bits as indexing bits to reduce hashing hardware complexity. Second, by leveraging characteristics of web sessions, we design a speculative cache replacement policy, which can effectively work on the TCB cache with two cache banks. Experimental results show that the new cache efficiently manages the per-session data. When it is used in TOEs or integrated into CPUs to manage the per-session data, TCP/IP processing time is significantly reduced, thus saving web server response time.","PeriodicalId":262685,"journal":{"name":"2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122685573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}