Design and Implementation of Broadcast Algorithms for Extreme-Scale Systems

2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI:10.1109/CLUSTER.2011.17

Pavel Shamis, R. Graham, Manjunath Gorentla Venkata, Joshua Ladd

引用次数: 4

Abstract

The scalability and performance of collective communication operations limit the scalability and performance of many scientific applications. This paper presents two new blocking and nonblocking Broadcast algorithms for communicators with arbitrary communication topology, and studies their performance. These algorithms benefit from increased concurrency and a reduced memory footprint, making them suitable for use on large-scale systems. Measuring small, medium, and large data Broadcasts on a Cray-XT5, using 24,576 MPI processes, the Cheetah algorithms outperform the native MPI on that system by 51%, 69%, and 9%, respectively, at the same process count. These results demonstrate an algorithmic approach to the implementation of the important class of collective communications, which is high performing, scalable, and also uses resources in a scalable manner.

查看原文本刊更多论文

极端规模系统广播算法的设计与实现

集体通信操作的可扩展性和性能限制了许多科学应用的可扩展性和性能。针对任意通信拓扑的通信器，提出了两种新的阻塞和非阻塞广播算法，并对其性能进行了研究。这些算法受益于增加的并发性和减少的内存占用，使它们适合在大规模系统上使用。在使用24,576个MPI进程的Cray-XT5上测量小型、中型和大型数据广播，Cheetah算法在相同的进程数下分别比该系统上的本机MPI性能高出51%、69%和9%。这些结果展示了一种算法方法来实现重要的集体通信类别，该方法具有高性能、可扩展性，并且还以可扩展的方式使用资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE International Conference on Cluster Computing

自引率

0.00%

发文量