Improving Strong-Scaling on GPU Cluster Based on Tightly Coupled Accelerators Architecture

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI:10.1109/CLUSTER.2015.154

T. Hanawa, H. Fujii, N. Fujita, Tetsuya Odajima, Kazuya Matsumoto, Yuetsu Kodama, T. Boku

引用次数: 0

Abstract

The Tightly Coupled Accelerators (TCA) architecture that we proposed in previous work enables direct communication between accelerators over nodes. In this paper, we present a proof-of-concept GPU cluster called the HA-PACS/TCA using the PEACH2 chip that we designed as an interconnection router chip based on the TCA architecture. Our system demonstrated 2.0 ?sec of latency on inter-node GPU-to-GPU communication with a PCIe Gen2 x8 by RDMA, reducing minimum latency to just 44% of the InfiniBand-QDR and MPI using GPUDirect for RDMA. Through results of Himeno benchmark tests, we demonstrated that our TCA architecture improved performance scalability with the small-sized problem by up to 61%.

查看原文本刊更多论文

基于紧耦合加速器架构的GPU集群强伸缩改进

我们在之前的工作中提出的紧耦合加速器(TCA)架构使节点上的加速器之间能够直接通信。在本文中，我们提出了一个名为HA-PACS/TCA的概念验证GPU集群，使用我们设计的基于TCA架构的互连路由器芯片PEACH2芯片。我们的系统显示，通过RDMA使用PCIe Gen2 x8进行节点间gpu到gpu通信的延迟为2.0秒，将最小延迟降低到仅为使用GPUDirect进行RDMA的InfiniBand-QDR和MPI的44%。通过Himeno基准测试的结果，我们证明了我们的TCA架构将小型问题的性能可伸缩性提高了61%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE International Conference on Cluster Computing

自引率

0.00%

发文量