Porting and performance evaluation of irregular codes using OpenMP

Concurr. Pract. Exp. Pub Date : 2000-10-01 DOI:10.1002/1096-9128(200010)12:12%3C1241::AID-CPE523%3E3.0.CO;2-D

D. Hisley, G. Agrawal, P. Satya-narayana, L. Pollock

{"title":"Porting and performance evaluation of irregular codes using OpenMP","authors":"D. Hisley, G. Agrawal, P. Satya-narayana, L. Pollock","doi":"10.1002/1096-9128(200010)12:12%3C1241::AID-CPE523%3E3.0.CO;2-D","DOIUrl":null,"url":null,"abstract":"In the last two years, OpenMP has been gaining popularity as a standard for developing portable shared memory parallel programs. With the improvements in centralized shared memory technologies and the emergence of distributed shared memory (DSM) architectures, several medium-to-large physical and logical shared memory configurations are now available. Thus, OpenMP stands to be a promising medium for developing scalable and portable parallel programs. \n \nIn this paper, we focus on evaluating the suitability of OpenMP for developing scalable and portable irregular applications. We examine the programming paradigms supported by OpenMP that are suitable for this important class of applications, the performance and scalability achieved with these applications, the achieved locality and uniprocessor cache performance and the factors behind imperfect scalability. We have used two irregular applications and one NAS irregular code as the benchmarks for our study. Our experiments have been conducted on a 64-processor SGI Origin 2000. \n \nOur experiments show that reasonably good scalability is possible using OpenMP if careful attention is paid to locality and load balancing issues. Particularly, using the Single Program Multiple Data (SPMD) paradigm for programming is a significant win over just using loop parallelization directives. As expected, the cost of remote accesses is the major factor behind imperfect speedups of SPMD OpenMP programs. Copyright © 2000 John Wiley & Sons, Ltd.","PeriodicalId":199059,"journal":{"name":"Concurr. Pract. Exp.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurr. Pract. Exp.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/1096-9128(200010)12:12%3C1241::AID-CPE523%3E3.0.CO;2-D","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

In the last two years, OpenMP has been gaining popularity as a standard for developing portable shared memory parallel programs. With the improvements in centralized shared memory technologies and the emergence of distributed shared memory (DSM) architectures, several medium-to-large physical and logical shared memory configurations are now available. Thus, OpenMP stands to be a promising medium for developing scalable and portable parallel programs. In this paper, we focus on evaluating the suitability of OpenMP for developing scalable and portable irregular applications. We examine the programming paradigms supported by OpenMP that are suitable for this important class of applications, the performance and scalability achieved with these applications, the achieved locality and uniprocessor cache performance and the factors behind imperfect scalability. We have used two irregular applications and one NAS irregular code as the benchmarks for our study. Our experiments have been conducted on a 64-processor SGI Origin 2000. Our experiments show that reasonably good scalability is possible using OpenMP if careful attention is paid to locality and load balancing issues. Particularly, using the Single Program Multiple Data (SPMD) paradigm for programming is a significant win over just using loop parallelization directives. As expected, the cost of remote accesses is the major factor behind imperfect speedups of SPMD OpenMP programs. Copyright © 2000 John Wiley & Sons, Ltd.

查看原文本刊更多论文

使用OpenMP进行不规则代码的移植和性能评估

在过去两年中，OpenMP作为开发可移植共享内存并行程序的标准越来越受欢迎。随着集中式共享内存技术的改进和分布式共享内存(DSM)体系结构的出现，现在出现了几种大中型物理和逻辑共享内存配置。因此，OpenMP将成为开发可伸缩和可移植并行程序的有前途的媒介。在本文中，我们着重于评估OpenMP对于开发可伸缩和可移植的不规则应用程序的适用性。我们研究了OpenMP支持的适合这类重要应用程序的编程范例，这些应用程序所实现的性能和可伸缩性，所实现的局部性和单处理器缓存性能以及不完美可伸缩性背后的因素。我们使用了两个不规则应用程序和一个NAS不规则代码作为我们研究的基准。我们的实验在64处理器的SGI Origin 2000上进行。我们的实验表明，如果仔细注意局部性和负载平衡问题，使用OpenMP可以获得相当好的可伸缩性。特别地，使用单程序多数据(SPMD)范式进行编程比仅仅使用循环并行化指令有很大的优势。正如预期的那样，远程访问的成本是SPMD OpenMP程序不完美的加速背后的主要因素。版权所有©2000约翰威利父子有限公司

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Concurr. Pract. Exp.

自引率

0.00%

发文量