R. Birke, D. Crisan, K. Barabash, A. Levin, C. DeCusatis, C. Minkenberg, M. Gusat
{"title":"Partition/aggregate in commodity 10G ethernet software-defined networking","authors":"R. Birke, D. Crisan, K. Barabash, A. Levin, C. DeCusatis, C. Minkenberg, M. Gusat","doi":"10.1109/HPSR.2012.6260821","DOIUrl":"https://doi.org/10.1109/HPSR.2012.6260821","url":null,"abstract":"One of the prevalent trends in emerging large scale multi-tenant datacenters is network virtualization using overlays. Here we investigate application performance degradation in such an overlay applied to commodity 10 Gigabit Ethernet networks. We have adopted partition/aggregate as a representative commercial workload that today is deployed on bare metal servers and is notoriously sensitive to latency and TCP incast congestion. Using query completion time as the primary metric, we evaluate the degree to which a software-defined network (SDN) overlay impacts this application's behavior, the performance bounds of partition/aggregate with an SDN overlay, and whether active queue management (AQM) such as random early detection (RED) can benefit this environment. We introduce a generic SDN overlay framework, which we measure in hardware and simulate using a real TCP stack extracted from FreeBSD v9, running over a detailed Layer 2 commodity 10G Ethernet fabric network simulator. To further alleviate TCP incast congestion and support legacy congestion control, we propose an AQM translation scheme called v-RED. Finally, we report results concerning SDN's benefits in addressing TCP incast. Contrary to our expectations, we found that latency-sensitive applications do not necessarily suffer from performance degradation when deployed over SDN overlays.","PeriodicalId":163079,"journal":{"name":"2012 IEEE 13th International Conference on High Performance Switching and Routing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124808955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LightFlow: Speeding up GPU-based flow switching and facilitating maintenance of flow table","authors":"N. Matsumoto, M. Hayashi","doi":"10.1109/HPSR.2012.6260831","DOIUrl":"https://doi.org/10.1109/HPSR.2012.6260831","url":null,"abstract":"Flow-based switching is increasingly important in accordance with the growing demand for in-network processing for cloud applications. Flow switching performance tends to be degraded in proportion to the number of flow entries. To reduce the number of flow entries, they can be aggregated by applying wildcard fields. Meanwhile, the existence of the wildcard entry adversely affects the use of a hash-based lookup on a flow table, and thus a linear search is inherent in flow switching. However, the linear search is currently the primary cause of performance limitation. To date, two flow tables, one for hash-based lookup and the other for a wildcard-enabled linear search, have been used for flow switching. While hash-based table lookup is much faster than linear search, it needs to be manually updated for every exact match entry. Maintaining a hash-based table of all the flow switches is not feasible from a network operator viewpoint. In this paper, LightFlow, a mechanism to accelerate software flow switching processing and relieve the burden of maintaining the flow table is proposed. In LightFlow, two-dimensional parallelization of a linear search is introduced to accelerate lookup of the wildcard-enabled flow entries. It also introduces a mechanism that allows updating of the hash table to be performed automatically based on the result of wildcard-aware table lookup. LightFlow satisfies both the need for fast table lookup and feasibility of flow table management which needs to allow a large number of wildcard entries. Experimental results show that LightFlow can increase the speed of lookup of a wildcard-aware flow table three-fold or more compared to the current GPU-based wildcard search mechanisms.","PeriodicalId":163079,"journal":{"name":"2012 IEEE 13th International Conference on High Performance Switching and Routing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129825567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large-scale multi-flow regular expression matching on FPGA","authors":"Yun Qu, Y. Yang, V. Prasanna","doi":"10.1109/HPSR.2012.6260830","DOIUrl":"https://doi.org/10.1109/HPSR.2012.6260830","url":null,"abstract":"High-throughput regular expression matching (REM) over a single packet flow for deep packet inspection in routers has been well studied. In many real-world cases, however, the packet processing operations are performed on a large number of packet flows, each supported by many run-time states. To handle a large number of flows, the architecture should support a mechanism to perform rapid context switch without adversely affecting the throughput. As the number of flows increases, large-capacity memory is needed to store per flow states of the matching. In this paper, we propose a hardware-accelerated context switch mechanism for managing a large number of states on memory efficiently. With sufficiently large off-chip memory, a state-of-the-art FPGA device can be multiplexed by millions of packet flows with negligible throughput degradation for large-size packets. Post-place-and-route results show that when 8 characters are matched per cycle, our design can achieve 180 MHz clock rate, leading to a throughput of 11.8 Gbps.","PeriodicalId":163079,"journal":{"name":"2012 IEEE 13th International Conference on High Performance Switching and Routing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116811834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Mehta, Ashutosh Upadhyaya, S. Bidkar, A. Gumaste
{"title":"Design of a shared memory Carrier Ethernet switch compliant to Provider Backbone Bridging-Traffic Engineering (IEEE802.1Qay)","authors":"S. Mehta, Ashutosh Upadhyaya, S. Bidkar, A. Gumaste","doi":"10.1109/HPSR.2012.6260824","DOIUrl":"https://doi.org/10.1109/HPSR.2012.6260824","url":null,"abstract":"Carrier Ethernet is emerging as a new transport paradigm across metropolitan and core networks. Provider Backbone Bridging-Traffic Engineering or PBB-TE was standardized in the IEEE as 802.1Qay as a mechanism to provide a dedicated transport service at the Ethernet layer. This paper discusses implementation of the PBB-TE standard using shared memory switch architecture, though the same architecture argument can be extended to implement MPLS-TP (the other manifestation of Carrier Ethernet). While shared memory switch architectures have been well investigated, we provide to the best of our knowledge the first carrier-class aggregation switch implemented in a single Field Programmable Gate Array (FPGA). This low-cost implementation paves the way for advances in Carrier Ethernet technologies to be made available to the access part of the network using rapid prototyping and commercial off the shelf components. The switch architecture supports multiple QoS levels and implements circuit emulation to transport traditional circuit services over a packet backbone. A rigorous simulation study validates our effort.","PeriodicalId":163079,"journal":{"name":"2012 IEEE 13th International Conference on High Performance Switching and Routing","volume":"181 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117159510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"StrideBV: Single chip 400G+ packet classification","authors":"Thilan Ganegedara, V. Prasanna","doi":"10.1109/HPSR.2012.6260820","DOIUrl":"https://doi.org/10.1109/HPSR.2012.6260820","url":null,"abstract":"Hardware firewalls act as the first line of defense in protecting networks against attacks. Packets are organized into flows based on a set of packet header fields and a predefined rule is applied on the packets in each flow to filter malicious network traffic. This is realized using packet classification, which is implemented in secure networking environments where mere best-effort delivery of packets is not adequate. Existing packet classification solutions are highly dependent on the properties (or features) of the ruleset. We present a bit vector based lookup scheme and a parallel hardware architecture that does not rely on ruleset features. A detailed performance analysis of the proposed scheme is given under different configurations. Post place-and-route results of our parallel pipelined architecture on a state-of-the-art Field Programmable Gate Array (FPGA) device shows that for real-life firewall rulesets, the proposed solution achieves 400G+ throughput. To the best of our knowledge, this is the first packet classification engine that achieves 400G+ rate on a single FPGA. Further, on the average we achieve 2.5× power efficiency compared with the state-of-the-art solutions.","PeriodicalId":163079,"journal":{"name":"2012 IEEE 13th International Conference on High Performance Switching and Routing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123249288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic pump-wavelength selection for optical packet switch with recursive parametric wavelength conversion","authors":"N. Kitsuwan, E. Oki","doi":"10.1109/HPSR.2012.6260846","DOIUrl":"https://doi.org/10.1109/HPSR.2012.6260846","url":null,"abstract":"This paper proposes a scheme for pump wavelengths selection in an optical packet switch (OPS) with parametric wavelength converters (PWCs), where the pump wavelengths are dynamically changed for all time slots and more than one PWC are allowed to convert a wavelength in a recursive manner. This scheme is called a dynamic pump-wavelength selection with recursive parametric wavelength conversion (DPS-R). A PWC, which has an advantage of multiple wavelength conversion, uses a pump wavelength that can be flexibly chosen to define which wavelengths can be converted from/to, called wavelength conversion pairs. The OPS allows each wavelength to be converted using combination of available conversion pairs from more than one PWC. A conventional scheme, pump wavelengths are statically preassigned, so that the conversion pairs are fixed for all time slots. Requests may remain since the pump wavelengths are not able to be reconfigured. The available conversion pairs may not support those requests. DPS-R is used to select the pump wavelength for each PWC to maximize the number of wavelength conversion pairs supported, in both recursive and non-recursive manners. Numerical results via simulation show that DPS-R outperforms the conventional scheme in term of packet loss rate.","PeriodicalId":163079,"journal":{"name":"2012 IEEE 13th International Conference on High Performance Switching and Routing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122864367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Peer-to-peer real-time group communication over content-centric network","authors":"Vincent Wing-Hei Luk, A. Wong, C. Lea, Li Lu","doi":"10.1109/HPSR.2012.6260836","DOIUrl":"https://doi.org/10.1109/HPSR.2012.6260836","url":null,"abstract":"Real-time group communication is fundamental to many emerging interactive multimedia applications. Communication among the group members is N-to-N in that any number of group members may generate data and control packets destined to all the other members at the same time. Redundancy Reduction Gossip (RRG) is a highly effective real-time N-to-N dynamic group protocol over the Internet and demonstrates churn and sporadic traffic coping capability. In this paper, a peer-to-peer (P2P) real-time group communication protocol based on RRG is proposed for Content-Centric Network (CCN). A highly effective synergy between RRG and the capability of CCN is exploited. Effects include connectivity expansion, traffic reduction and latency improvement. The proposed protocol allows information distributed from an arbitrary number of dynamic sources in a group. It has low latency and minimal membership maintenance.","PeriodicalId":163079,"journal":{"name":"2012 IEEE 13th International Conference on High Performance Switching and Routing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128802096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable per-flow resource management for large hierarchical networks","authors":"I. Atov, C. Kaufman, R. Perlman","doi":"10.1109/HPSR.2012.6260844","DOIUrl":"https://doi.org/10.1109/HPSR.2012.6260844","url":null,"abstract":"In this paper we present a new resource management architecture, which provides end-to-end QoS guarantees to individual flows in a large, hierarchical network, such as the global Internet. This framework provides exact reservation and guaranteed resources to individual flows and, therefore, has the same expressive power as the IntServ model, while at the same time manages to overcome the scalability problem of IntServ. Furthermore, in addition to regulating per-flow resource allocation, the proposed architecture enables per-flow accounting in a secure manner. We are particularly interested in building scalable per-flow resource management architecture for the Internet, as it is known that aggregate traffic based services provided with DiffServ solutions have lower flexibility, utilization and performance assurance when compared to the services that can be provided with per-flow mechanisms.","PeriodicalId":163079,"journal":{"name":"2012 IEEE 13th International Conference on High Performance Switching and Routing","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127123787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large MTUs and internet performance","authors":"David Murray, Terry Koziniec, Kevin Lee, M. Dixon","doi":"10.1109/HPSR.2012.6260832","DOIUrl":"https://doi.org/10.1109/HPSR.2012.6260832","url":null,"abstract":"Ethernet data rates have increased many orders of magnitudes since standardisation in 1982. Despite these continual data rates increases, the 1500 byte Maximum Transmission Unit (MTU) of Ethernet remains unchanged. Experiments with varying latencies, loss rates and transaction lengths are performed to investigate the potential benefits of Jumboframes on the Internet. This study reveals that large MTUs offer throughputs much larger than a simplistic overhead analysis might suggest. The reasons for these higher throughputs are explored and discussed.","PeriodicalId":163079,"journal":{"name":"2012 IEEE 13th International Conference on High Performance Switching and Routing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130509694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiajia Liu, Juntao Gao, Xiaohong Jiang, Hiroki Nishiyama, N. Kato
{"title":"Probing-based two-hop relay with limited packet redundancy","authors":"Jiajia Liu, Juntao Gao, Xiaohong Jiang, Hiroki Nishiyama, N. Kato","doi":"10.1109/HPSR.2012.6260834","DOIUrl":"https://doi.org/10.1109/HPSR.2012.6260834","url":null,"abstract":"Due to their simplicity and efficiency, the two-hop relay algorithm and its variants serve as a class of attractive routing schemes for mobile ad hoc networks (MANETs). With the available two-hop relay schemes, a node, whenever getting an opportunity for transmission, randomly probes only once a neighbor node for the possible transmission. It is notable that such single probing strategy, although simple, may result in a significant waste of the precious transmission opportunities in highly dynamic MANETs. To alleviate such limitation for a more efficient utilization of limited wireless bandwidth, this paper explores a more general probing-based two-hop relay algorithm with limited packet redundancy. In such an algorithm with probing round limit τ and packet redundancy limit f, each transmitter node is allowed to conduct up to τ rounds of probing for identifying a possible receiver and each packet can be delivered to at most f distinct relays. A general theoretical framework is further developed to help us understand that under different setting of τ and f, how we can benefit from multiple probings in terms of the per node throughput capacity.","PeriodicalId":163079,"journal":{"name":"2012 IEEE 13th International Conference on High Performance Switching and Routing","volume":"363 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121383378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}