TCCluster: A Cluster Architecture Utilizing the Processor Host Interface as a Network Interconnect

2010 IEEE International Conference on Cluster Computing Pub Date : 2010-09-20 DOI:10.1109/CLUSTER.2010.37

Heiner Litz, M. Thürmer, U. Brüning

{"title":"TCCluster: A Cluster Architecture Utilizing the Processor Host Interface as a Network Interconnect","authors":"Heiner Litz, M. Thürmer, U. Brüning","doi":"10.1109/CLUSTER.2010.37","DOIUrl":null,"url":null,"abstract":"So far, large computing clusters consisting of several thousand machines have been constructed by connecting nodes together using interconnect technologies as e.g. Ethernet, Infiniband or Myrinet. We propose an entirely new architecture called Tightly Coupled Cluster (TCCluster) that instead uses the native host interface of the processors as a direct network interconnect. This approach offers higher bandwidth and much lower communication latencies than the traditional approaches by virtually integrating the network interface adapter into the processor. Our technique neither applies any modifications to the processor nor requires any additional hardware. Instead, we use commodity off the shelf AMD processors and exploit the HyperTransport host interface as a cluster interconnect. Our approach is purely software based and does not require any additional hardware nor modifications to the existing processors. In this paper, we explain the addressing of nodes in such a cluster, the routing within such a system and the programming model that can be applied. We present a detailed description of the tasks that need to be addressed and provide a proof of concept implementation. For the evaluation of our technique a two node TCCluster prototype is presented. Therefore, the BIOS firmware, a custom Linux kernel and a small message library has been developed. We present microbenchmarks that show a sustained bandwidth of up to 2500 MB/s for messages as small as 64 Byte and a communication latency of 227 ns between two nodes outperforming other high performance networks by an order of magnitude.","PeriodicalId":152171,"journal":{"name":"2010 IEEE International Conference on Cluster Computing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2010.37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

So far, large computing clusters consisting of several thousand machines have been constructed by connecting nodes together using interconnect technologies as e.g. Ethernet, Infiniband or Myrinet. We propose an entirely new architecture called Tightly Coupled Cluster (TCCluster) that instead uses the native host interface of the processors as a direct network interconnect. This approach offers higher bandwidth and much lower communication latencies than the traditional approaches by virtually integrating the network interface adapter into the processor. Our technique neither applies any modifications to the processor nor requires any additional hardware. Instead, we use commodity off the shelf AMD processors and exploit the HyperTransport host interface as a cluster interconnect. Our approach is purely software based and does not require any additional hardware nor modifications to the existing processors. In this paper, we explain the addressing of nodes in such a cluster, the routing within such a system and the programming model that can be applied. We present a detailed description of the tasks that need to be addressed and provide a proof of concept implementation. For the evaluation of our technique a two node TCCluster prototype is presented. Therefore, the BIOS firmware, a custom Linux kernel and a small message library has been developed. We present microbenchmarks that show a sustained bandwidth of up to 2500 MB/s for messages as small as 64 Byte and a communication latency of 227 ns between two nodes outperforming other high performance networks by an order of magnitude.

查看原文本刊更多论文

TCCluster:利用处理器主机接口作为网络互连的集群体系结构

到目前为止，由数千台机器组成的大型计算集群已经通过使用互连技术(例如以太网、Infiniband或Myrinet)将节点连接在一起来构建。我们提出了一种全新的架构，称为紧耦合集群(TCCluster)，它使用处理器的本机主机接口作为直接的网络互连。通过将网络接口适配器虚拟地集成到处理器中，这种方法提供了比传统方法更高的带宽和更低的通信延迟。我们的技术既不需要对处理器进行任何修改，也不需要任何额外的硬件。相反，我们使用现成的AMD处理器，并利用HyperTransport主机接口作为集群互连。我们的方法完全基于软件，不需要任何额外的硬件，也不需要对现有处理器进行修改。在本文中，我们解释了这种集群中节点的寻址，这种系统中的路由和可应用的编程模型。我们提供了需要解决的任务的详细描述，并提供了概念实现的证明。为了评估我们的技术，给出了一个双节点TCCluster原型。因此，开发了BIOS固件、自定义Linux内核和小型消息库。我们提供的微基准测试显示，对于小至64字节的消息，持续带宽高达2500 MB/s，两个节点之间的通信延迟为227 ns，比其他高性能网络的性能高出一个数量级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 IEEE International Conference on Cluster Computing

自引率

0.00%

发文量