{"title":"Long round-trip time support with shared-memory crosspoint buffered packet switch","authors":"Z. Dong, R. Rojas-Cessa","doi":"10.1109/CONECT.2005.26","DOIUrl":"https://doi.org/10.1109/CONECT.2005.26","url":null,"abstract":"The amount of memory in buffered crossbars in combined input-crosspoint buffered switches is proportional to the number of crosspoints, or O(N/sup 2/), where N is the number of ports, and to the crosspoint buffer size, which is defined by the distance between the line cards and the buffered crossbar, to achieve 100% throughput under port-rate data flows. A long distance between these two components can make a buffered crossbar costly to implement. In this paper, we propose and examine two shared-memory crosspoint buffered packet switches that use small crosspoint buffers to support a long round-trip time, which is mainly affected by the transmission delay caused by the distance between line cards and the buffered crossbar. The proposed switch reduces the required buffer memory of the buffered crossbar by 50% or more. We show that a shared-memory crosspoint buffer switch can provide high this improvement without speedup.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121609680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dror Goldenberg, Michael Kagan, Ran Ravid, Michael S. Tsirkin
{"title":"Zero copy sockets direct protocol over infiniband-preliminary implementation and performance analysis","authors":"Dror Goldenberg, Michael Kagan, Ran Ravid, Michael S. Tsirkin","doi":"10.1109/CONECT.2005.35","DOIUrl":"https://doi.org/10.1109/CONECT.2005.35","url":null,"abstract":"Sockets direct protocol (SDP) is a byte-stream transport protocol implementing the TCP SOCK/spl I.bar/STREAM semantics utilizing transport offloading capabilities of the infiniband fabric: Under the hood, SDP supports zero-copy (ZCopy) operation mode, using the infiniband RDMA capability to transfer data directly between application buffers. Alternatively, in buffer copy (BCopy) mode, data is copied to and from transport buffers. In the initial open-source SDP implementation, ZCopy mode was restricted to asynchronous I/O operations. We added a prototype ZCopy support for send()/recv() synchronous socket calls. This paper presents the major architectural aspects of the SDP protocol, the ZCopy implementation, and a preliminary performance evaluation. We show substantial benefits of ZCopy when multiple connections are running in parallel on the same host. For example, when 8 connections are simultaneously active, enabling ZCopy yields a bandwidth growth from 500 MB/s to 700 MB/s, while CPU utilization decreases 8 times.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114892493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Gusat, D. Craddock, W. Denzel, Antonius P. J. Engbersen, N. Ni, G. Pfister, W. Rooney, J. Duato
{"title":"Congestion control in InfiniBand networks","authors":"M. Gusat, D. Craddock, W. Denzel, Antonius P. J. Engbersen, N. Ni, G. Pfister, W. Rooney, J. Duato","doi":"10.1109/CONECT.2005.14","DOIUrl":"https://doi.org/10.1109/CONECT.2005.14","url":null,"abstract":"Driving computer interconnection networks closer to saturation minimizes cost/performance and power consumption, but requires efficient congestion control to prevent catastrophic performance degradation during traffic peaks or \"hot spot\" traffic patterns. The InfiniBand/spl trade/Architecture provides such congestion control, but lacks guidance for setting its parameters. At its adoption, it was unproven that there were any settings that would work at all, avoid instability or oscillations. This paper reports on a simulation-driven exploration of that parameter space which verifies that the architected scheme can, in fact, work properly despite inherent delays in its feedback mechanism.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124384518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Initial performance evaluation of the Cray SeaStar interconnect","authors":"R. Brightwell, K. Pedretti, K. Underwood","doi":"10.1109/CONECT.2005.24","DOIUrl":"https://doi.org/10.1109/CONECT.2005.24","url":null,"abstract":"The Cray SeaStar is a new network interface and router for the Cray Red Storm and XT3 supercomputer. The SeaStar was designed specifically to meet the performance and reliability needs of a large-scale, distributed-memory scientific computing platform. In this paper, we present an initial performance evaluation of the SeaStar. We first provide a detailed overview of the hardware and software features of the SeaStar, followed by the results of several low-level micro-benchmarks. These initial results indicate that SeaStar is on a path to achieving its performance targets.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"418 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115246180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Minkenberg, F. Abel, Peter Müller, R. Krishnamurthy, M. Gusat, B. Hemenway
{"title":"Control path implementation for a low-latency optical HPC switch","authors":"C. Minkenberg, F. Abel, Peter Müller, R. Krishnamurthy, M. Gusat, B. Hemenway","doi":"10.1109/CONECT.2005.15","DOIUrl":"https://doi.org/10.1109/CONECT.2005.15","url":null,"abstract":"A crucial part of any high-performance computing system is its interconnection network. In the OSMOSIS project, Corning and IBM are jointly developing a demonstrator interconnect based on optical cell switching with electronic control. Starting from the core set of requirements, we present the system design rationale and show how it impacts the practical implementation. Our focus is on solving the technical issues related to the electronic control path, and we show that it is feasible at the targeted design point.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121181554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SIFT: snort intrusion filter for TCP","authors":"Michael Attig, J. Lockwood","doi":"10.1109/CONECT.2005.33","DOIUrl":"https://doi.org/10.1109/CONECT.2005.33","url":null,"abstract":"Intrusion rule processing in reconfigurable hardware enables intrusion detection and prevention services to run at multiGigabit/second rates. High-level intrusion rules mapped directly into hardware separate malicious content from benign content in network traffic. Hardware parallelism allows intrusion systems to scale to support fast network links, such as OC-192 and 10 Gbps Ethernet. In this paper, a snort intrusion filter for TCP (SIFT) is presented that operates as a preprocessor to prevent benign traffic from being inspected by an intrusion monitor running Snort. Snort is a popular open-source rule-processing intrusion system. SIFT selectively forwards IP packets that contain questionable headers or defined signatures to a PC where complete rule processing is performed. SIFT alleviates the need for most network traffic from being inspected by software. Statistics, like how many packets match rules, are used to optimize rule processing systems. SIFT has been implemented and tested in FPGA hardware and used to process Internet traffic from a campus Internet backbone with live data.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131161822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Addressing queuing bottlenecks at high speeds","authors":"S. Sushanth Kumar, J. Turner, P. Crowley","doi":"10.1109/CONECT.2005.7","DOIUrl":"https://doi.org/10.1109/CONECT.2005.7","url":null,"abstract":"Modern routers and switch fabrics can have hundreds of input and output ports running at up to 10 Gb/s; 40 Gb/s systems are starting to appear. At these rates, the performance of the buffering and queuing subsystem becomes a significant bottleneck. In high performance routers with more than a few queues, packet buffering is typically implemented using DRAM for data storage and a combination of off-chip and on-chip SRAM for storing the linked-list nodes and packet length, and the queue headers, respectively. This paper focuses on the performance bottlenecks associated with the use of off-chip SRAM. We show how the combination of implicit buffer pointers and multi-buffer list nodes can dramatically reduce the impact of buffering and queuing subsystem on queuing performance. We also show how combining it with coarse-grained scheduling can improve the performance of fair queuing algorithms, while also reducing the amount of off-chip memory and bandwidth needed. These techniques can reduce the amount of SRAM needed to hold the list nodes by a factor of 10 at the cost of about 10% wastage of the DRAM space, assuming an aggregation degree of 16.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133829844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quality of service in global grid computing","authors":"L. Valcarenghi","doi":"10.1109/CONECT.2005.31","DOIUrl":"https://doi.org/10.1109/CONECT.2005.31","url":null,"abstract":"This tutorial tries to address some of the issue related to the keywords present in Foster's grid computing definition. Specifically it tackles the problem of providing global grid computing applications with a network infrastructure able to guarantee quality of service. After reviewing the basics of grid computing, this tutorial focuses on specific network infrastructure issues. Quality of service (QoS) parameters such as throughput, delay, and resilience are considered. It is shown that how the integration of the grid programming environment with an intelligent grid network infrastructure allows to dynamically adapt the utilized computational and network resources to meet the application QoS requirements transparently to the user. Finally the performance evaluation of a specific implementation of an integrated application and network layer resilience scheme is presented.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132033191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Valcarenghi, F. Paolucci, L. Foschini, F. Cugini, P. Castoldi
{"title":"Centralized and distributed topology discovery service implementations","authors":"L. Valcarenghi, F. Paolucci, L. Foschini, F. Cugini, P. Castoldi","doi":"10.1109/CONECT.2005.11","DOIUrl":"https://doi.org/10.1109/CONECT.2005.11","url":null,"abstract":"In global grid computing, i.e., wide area network (WAN) grid computing. Grid network services allow grid users or the programming environment to monitor the status of network resources and to reallocate them. Specifically, the network information and monitoring service (NIMS) provides up to date information on the grid network status. In this study two implementations of a specific NIMS component, i.e., the topology discovery service (TDS), are presented. The first implementation features a centralized broker that produces information for the consumers/users. In the second one, users are contemporarily producers and consumers of the required information. Both implementations are applicable to networks based on commercial routers without requiring any router protocol modification.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"590 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134364336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and implementation of a content-aware switch using a network processor","authors":"Li Zhao, Yan Luo, L. Bhuyan, R. Iyer","doi":"10.1109/CONECT.2005.16","DOIUrl":"https://doi.org/10.1109/CONECT.2005.16","url":null,"abstract":"Cluster based server architectures have been widely used as a solution to overloading in Web servers because of their cost effectiveness, scalability and reliability. A content aware switch can be used to examine the Web requests and distribute them to the servers based on application level information. In this paper, we present the analysis, design and implementation of such a content aware switch based on an IXP2400 network processor (NP). We first analyze the mechanisms for implementing a content-aware switch and present the necessity for an NP-based solution. We then present various possibilities of workload allocation among different computation resources in an NP and discuss the design tradeoffs. Measurement results based on an IXP 2400 NP demonstrate that our NP-based switch can reduce the http processing latency by an average of 83.3% for a 1 K byte Web page, compared to a Linux-based switch. The amount of reduction increases with larger file sizes. It is also shown that the packet throughput can be improved by up to 5.7x across a range of files by taking advantage of multithreading and multiprocessing, available in the NP.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129864773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}