Hybrid parallel programming with MPI and unified parallel C

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI:10.1145/1787275.1787323

James Dinan, P. Balaji, E. Lusk, P. Sadayappan, R. Thakur

{"title":"Hybrid parallel programming with MPI and unified parallel C","authors":"James Dinan, P. Balaji, E. Lusk, P. Sadayappan, R. Thakur","doi":"10.1145/1787275.1787323","DOIUrl":null,"url":null,"abstract":"The Message Passing Interface (MPI) is one of the most widely used programming models for parallel computing. However, the amount of memory available to an MPI process is limited by the amount of local memory within a compute node. Partitioned Global Address Space (PGAS) models such as Unified Parallel C (UPC) are growing in popularity because of their ability to provide a shared global address space that spans the memories of multiple compute nodes. However, taking advantage of UPC can require a large recoding effort for existing parallel applications. In this paper, we explore a new hybrid parallel programming model that combines MPI and UPC. This model allows MPI programmers incremental access to a greater amount of memory, enabling memory-constrained MPI codes to process larger data sets. In addition, the hybrid model offers UPC programmers an opportunity to create static UPC groups that are connected over MPI. As we demonstrate, the use of such groups can significantly improve the scalability of locality-constrained UPC codes. This paper presents a detailed description of the hybrid model and demonstrates its effectiveness in two applications: a random access benchmark and the Barnes-Hut cosmological simulation. Experimental results indicate that the hybrid model can greatly enhance performance; using hybrid UPC groups that span two cluster nodes, RA performance increases by a factor of 1.33 and using groups that span four cluster nodes, Barnes-Hut experiences a twofold speedup at the expense of a 2% increase in code size.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"56","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th ACM international conference on Computing frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1787275.1787323","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 56

Abstract

The Message Passing Interface (MPI) is one of the most widely used programming models for parallel computing. However, the amount of memory available to an MPI process is limited by the amount of local memory within a compute node. Partitioned Global Address Space (PGAS) models such as Unified Parallel C (UPC) are growing in popularity because of their ability to provide a shared global address space that spans the memories of multiple compute nodes. However, taking advantage of UPC can require a large recoding effort for existing parallel applications. In this paper, we explore a new hybrid parallel programming model that combines MPI and UPC. This model allows MPI programmers incremental access to a greater amount of memory, enabling memory-constrained MPI codes to process larger data sets. In addition, the hybrid model offers UPC programmers an opportunity to create static UPC groups that are connected over MPI. As we demonstrate, the use of such groups can significantly improve the scalability of locality-constrained UPC codes. This paper presents a detailed description of the hybrid model and demonstrates its effectiveness in two applications: a random access benchmark and the Barnes-Hut cosmological simulation. Experimental results indicate that the hybrid model can greatly enhance performance; using hybrid UPC groups that span two cluster nodes, RA performance increases by a factor of 1.33 and using groups that span four cluster nodes, Barnes-Hut experiences a twofold speedup at the expense of a 2% increase in code size.

查看原文本刊更多论文

基于MPI和统一并行C语言的混合并行编程

消息传递接口(Message Passing Interface, MPI)是并行计算中使用最广泛的编程模型之一。但是，MPI进程可用的内存量受到计算节点内本地内存量的限制。像统一并行C (UPC)这样的分区全局地址空间(PGAS)模型越来越受欢迎，因为它们能够提供跨越多个计算节点内存的共享全局地址空间。然而，利用UPC可能需要为现有的并行应用程序进行大量的重新编码工作。在本文中，我们探索了一种结合MPI和UPC的新型混合并行编程模型。该模型允许MPI程序员增量访问更大的内存，使内存受限的MPI代码能够处理更大的数据集。此外，混合模型为UPC程序员提供了创建通过MPI连接的静态UPC组的机会。正如我们所演示的，使用这样的组可以显著提高位置约束的UPC代码的可伸缩性。本文给出了混合模型的详细描述，并在随机访问基准和Barnes-Hut宇宙学模拟两种应用中证明了它的有效性。实验结果表明，混合模型能显著提高性能;使用跨越两个集群节点的混合UPC组，RA性能提高了1.33倍;使用跨越四个集群节点的组，Barnes-Hut的速度提高了两倍，代价是代码大小增加了2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 7th ACM international conference on Computing frontiers

自引率

0.00%

发文量