{"title":"Using the open network lab","authors":"J. Turner","doi":"10.1109/CONECT.2005.34","DOIUrl":"https://doi.org/10.1109/CONECT.2005.34","url":null,"abstract":"The Open Network Laboratory is a resource designed to enable experimental evaluation of advanced networking concepts in a realistic operating environment. The laboratory is built around a set of open-source, extensible, high performance routers, which can be accessed by remote users through a remote laboratory interface (RLI). The RLI allows users to configure the testbed network, run applications and monitor those running applications using built-in data gathering mechanisms. Support for data visualization and real-time remote display is provided. The RLI also allows users to extend, modify or replace the software running in the routers' embedded processors and to similarly extend, modify or replace the routers' packet processing hardware, which is implemented largely using field programmable gate arrays. The routers included in the testbed are architecturally similar to high performance commercial routers, enabling researchers to evaluate their ideas in a much more realistic context than can be provided by PC-based routers. The Open Network Laboratory is designed to provide a setting in which systems researchers can evaluate and refine their ideas and then demonstrate them to those interested in moving their technology into new products and services. This tutorial will teach users how to use the ONL. It will include detailed presentations on the system architecture and principles of operation, as well as live demonstrations. We also plan to give participants an opportunity for hands-on experience with setting up and running experiments themselves.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127965528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rajeev Sivaram, R. Govindaraju, P. Hochschild, Robert Blackmore, Piyush Chaudhary
{"title":"Breaking the connection: RDMA deconstructed","authors":"Rajeev Sivaram, R. Govindaraju, P. Hochschild, Robert Blackmore, Piyush Chaudhary","doi":"10.1109/CONECT.2005.9","DOIUrl":"https://doi.org/10.1109/CONECT.2005.9","url":null,"abstract":"The architecture, design and performance of RDMA (remote direct memory access) over the IBM HPS (high performance switch and adapter) are described. Unlike conventional implementations such as InfiniBand, our RDMA transport model is layered on top of an unreliable datagram interface, while leaving the task of enforcing reliability to the ULP (upper layer protocol). We demonstrate that our model allows a single MPI task to deliver bidirectional bandwidth of close to 3.0 GB/s across a single link and 24.0 GB/s when striped across 8 links. In addition, we show that this transport protocol has superior attributes in terms of a) being able to handle RDMA packets coming out of order; b) being able to use multiple routes between a source-destination pair and c) reducing the size of adapter caches.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131870508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Akhbarizadeh, M. Nourani, R. Panigrahy, Samar Sharma
{"title":"High-speed and low-power network search engine using adaptive block-selection scheme","authors":"M. Akhbarizadeh, M. Nourani, R. Panigrahy, Samar Sharma","doi":"10.1109/CONECT.2005.20","DOIUrl":"https://doi.org/10.1109/CONECT.2005.20","url":null,"abstract":"A new approach for using block-selection scheme to increase the search throughput of multi-block TCAM-based network search engines is proposed. While the existing methods try to counter and forcibly balance the inherent bias of the Internet traffic, our method takes advantage of it. Our method improves flexibility of table management and gains scalability towards high rates of change in traffic bias. It offers higher throughput than the current art and a very low average power consumption. One of the embodiments of the proposed model, using four TCAM chips, can deliver over six times the throughput of a conventional configuration of the same TCAM chips.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129557776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of randomized multichannel packet storage for high performance routers","authors":"S. Sushanth Kumar, P. Crowley, J. Turner","doi":"10.1109/CONECT.2005.17","DOIUrl":"https://doi.org/10.1109/CONECT.2005.17","url":null,"abstract":"High performance routers require substantial amounts of memory to store packets awaiting transmission, requiring the use of dedicated memory devices with the density and capacity to provide the required storage economically. The memory bandwidth required for packet storage subsystems often exceeds the bandwidth of individual memory devices, making it necessary to implement packet storage using multiple memory channels. This raises the question of how to design multichannel storage systems that make effective use of the available memory and memory bandwidth, while forwarding packets at link rate in the presence of arbitrary packet retrieval patterns. A recent series of papers has demonstrated an architecture that uses on-chip SRAM to buffer packets going to/from a multichannel storage system, while maintaining high performance in the presence worst-case traffic patterns. Unfortunately, the amount of on-chip storage required grows as the product of the number of channels and the number of separate queues served by the packet storage system. This makes it too expensive to use in systems with large numbers of queues. We show how to design a practical randomized packet storage system that can sustain high performance using an amount of on-chip storage that is independent of the number of queues.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115857829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A scalable, self-routed, terabit capacity, photonic interconnection network","authors":"A. Shacham, Benjamin G. Lee, K. Bergman","doi":"10.1109/CONECT.2005.6","DOIUrl":"https://doi.org/10.1109/CONECT.2005.6","url":null,"abstract":"We present SPINet (Scalable Photonic Integrated Network), an optical switching architecture particularly designed for photonic integration. The performance of SPlNet-based networks is investigated through simulations, and it is shown that SPINet can provide the bandwidth demanded by high performance computing systems while meeting the ultra-low latency and scalability requirements. Experiments are conducted on a model SOA-based switching node to verify the feasibility of the SPINet concepts, and demonstrate error-free routing of 160 Gb/s peak bandwidth payload.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116233913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reconfigurable networking hardware: a classroom tool","authors":"M. Casado, G. Watson, N. McKeown","doi":"10.1109/CONECT.2005.32","DOIUrl":"https://doi.org/10.1109/CONECT.2005.32","url":null,"abstract":"We present an educational platform for teaching the design, debugging and deployment of real networking equipment in the operational Internet. The emphasis of our work is on teaching and, therefore, on providing an environment that is flexible, robust, low cost and easy to use. The platform is built around 'NetFPGAs'-custom boards containing eight Ethernet ports and two FPGAs. NetFPGA boards, when used with VNS (Virtual Network System-another tool we have developed), can be integrated into dynamically configurable network topologies reachable from the Internet. VNS enables a user-space process running on any remote computer to function as a system controller for the NetFPGA boards. NetFPGA and VNS are used at Stanford in a graduate level networking course to teach router implementation in hardware and software.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130246966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A scalable switch for service guarantees","authors":"Bill Lin, I. Keslassy","doi":"10.1109/CONECT.2005.5","DOIUrl":"https://doi.org/10.1109/CONECT.2005.5","url":null,"abstract":"Operators need routers to provide service guarantees such as guaranteed flow rates and fairness among flows, so as to support real-time traffic and traffic engineering. However, current centralized input-queued router architectures cannot scale to fast line rates while providing these service guarantees. On the other hand, while load-balanced switch architectures that rely on two identical stages of fixed configuration switches appear to be an effective way to scale Internet routers to very high capacities, there is currently no practical and scalable solution for providing service guarantees in these architectures. In this paper, we introduce the interleaved matching switch (IMS) architecture, which relies on a novel approach to provide service guarantees using load-balanced switches. The approach is based on emulating a Birkhoff-von Neumann switch with a load-balanced switch architecture and is applicable to any admissible traffic. In cases where: fixed frame sizes are applicable, we also present an efficient frame-based decomposition method. More generally, we show that the IMS architecture can be used to emulate any input queued or combined input-output queued switch.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"39 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128204088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wu-chun Feng, P. Balaji, C. Baron, L. Bhuyan, D. Panda
{"title":"Performance characterization of a 10-Gigabit Ethernet TOE","authors":"Wu-chun Feng, P. Balaji, C. Baron, L. Bhuyan, D. Panda","doi":"10.1109/CONECT.2005.30","DOIUrl":"https://doi.org/10.1109/CONECT.2005.30","url":null,"abstract":"Though traditional Ethernet based network architectures such as Gigabit Ethernet have suffered from a huge performance difference as compared to other high performance networks (e.g, InfiniBand, Quadrics, Myrinet), Ethernet has continued to be the most widely used network architecture today. This trend is mainly attributed to the low cost of the network components and their backward compatibility with the existing Ethernet infrastructure. With the advent of 10-Gigabit Ethernet and TCP offload engines (TOEs), whether this performance gap be bridged is an open question. In this paper, we present a detailed performance evaluation of the Chelsio T110 10-Gigabit Ethernet adapter with TOE. We have done performance evaluations in three broad categories: (i) detailed micro-benchmark performance evaluation at the sockets layer, (ii) performance evaluation of the message passing interface (MPI) stack atop the sockets interface, and (iii) application-level evaluations using the Apache Web server. Our experimental results demonstrate latency as low as 8.9 /spl mu/s and throughput of nearly 7.6 Gbps for these adapters. Further, we see an order-of-magnitude improvement in the performance of the Apache Web server while utilizing the TOE as compared to the basic 10-Gigabit Ethernet adapter without TOE.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134359794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid cache architecture for high speed packet processing","authors":"Z. Liu, K. Zheng, B. Liu","doi":"10.1049/iet-cdt:20060085","DOIUrl":"https://doi.org/10.1049/iet-cdt:20060085","url":null,"abstract":"The exposed memory hierarchies employed in many network processors (NPs) are expensive and hard to be effectively utilized. On the other hand, conventional cache cannot be directly incorporated into NP either because of its low efficiency in locality exploitation for network applications. In this paper, a novel memory hierarchy component, called split control cache, is presented. The proposed scheme employs two independent low latency memory stores to temporarily hold the flow-based and application-relevant information, exploiting the different locality behaviors exhibited by these two types of data. Data movement is manipulated by specially designed hardware to relieve the programmers from details of memory management. Performance evaluation shows that this component can achieve a hit rate of over 90% with only 16 KB of memories in route lookup under link rate of OC-3c and provide enough flexibility for the implementation of most network applications.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134595188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Sur, Abhinav Vishnu, Hyun-Wook Jin, Wei Huang, D. Panda
{"title":"Can memory-less network adapters benefit next-generation infiniband systems?","authors":"S. Sur, Abhinav Vishnu, Hyun-Wook Jin, Wei Huang, D. Panda","doi":"10.1109/CONECT.2005.10","DOIUrl":"https://doi.org/10.1109/CONECT.2005.10","url":null,"abstract":"InfiniBand is emerging as a high-performance interconnect. It is gaining popularity because of its high performance and open standard. Recently, PCI-Express, which is the third generation high-performance I/O bus used to interconnect peripheral devices, has been released. The third generation of InfiniBand adapters allow applications to take advantage of PCI-Express. PCI-Express offers very low latency access of the host memory by network interface cards (NICs). Earlier generation InfiniBand adapters used to have an external DIMM attached as local NIC memory. This memory was used to store internal information. This memory increases the overall cost of the NIC. In this paper we design experiments, analyze the performance of various communication patterns and end applications on PCI-Express based systems, whose adapters can be chosen to run with or without local NIC memory. Our investigations reveal that on these systems, the memory fetch latency is the same for both local NIC memory and host memory. Under heavy I/O bus usage, the latency of a scatter operation increased only by 10% and only for message sizes IB -4 KB. These memory-less adapters allow more efficient use of overall system memory and show practically no performance impact (less than 0.1%) for the NAS parallel benchmarks on 8 processes. These results indicate that memory-less network adapters can benefit next generation InfiniBand systems.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129772697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}