Abstractions and Directives for Adapting Wavefront Algorithms to Future Architectures

Proceedings of the Platform for Advanced Scientific Computing Conference Pub Date : 2018-07-02 DOI:10.1145/3218176.3218228

Robert Searles, S. Chandrasekaran, W. Joubert, Oscar R. Hernandez

{"title":"Abstractions and Directives for Adapting Wavefront Algorithms to Future Architectures","authors":"Robert Searles, S. Chandrasekaran, W. Joubert, Oscar R. Hernandez","doi":"10.1145/3218176.3218228","DOIUrl":null,"url":null,"abstract":"Architectures are rapidly evolving, and exascale machines are expected to offer billion-way concurrency. We need to rethink algorithms, languages and programming models among other components in order to migrate large scale applications and explore parallelism on these machines. Although directive-based programming models allow programmers to worry less about programming and more about science, expressing complex parallel patterns in these models can be a daunting task especially when the goal is to match the performance that the hardware platforms can offer. One such pattern is wavefront. This paper extensively studies a wavefront-based miniapplication for Denovo, a production code for nuclear reactor modeling. We parallelize the Koch-Baker-Alcouffe (KBA) parallel-wavefront sweep algorithm in the main kernel of Minisweep (the miniapplication) using CUDA, OpenMP and OpenACC. Our OpenACC implementation running on NVIDIA's next-generation Volta GPU boasts an 85.06x speedup over serial code, which is larger than CUDA's 83.72x speedup over the same serial implementation. Our experimental platform includes SummitDev, an ORNL representative architecture of the upcoming Summit supercomputer. Our parallelization effort across platforms also motivated us to define an abstract parallelism model that is architecture independent, with a goal of creating software abstractions that can be used by applications employing the wavefront sweep motif.","PeriodicalId":174137,"journal":{"name":"Proceedings of the Platform for Advanced Scientific Computing Conference","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Platform for Advanced Scientific Computing Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3218176.3218228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Architectures are rapidly evolving, and exascale machines are expected to offer billion-way concurrency. We need to rethink algorithms, languages and programming models among other components in order to migrate large scale applications and explore parallelism on these machines. Although directive-based programming models allow programmers to worry less about programming and more about science, expressing complex parallel patterns in these models can be a daunting task especially when the goal is to match the performance that the hardware platforms can offer. One such pattern is wavefront. This paper extensively studies a wavefront-based miniapplication for Denovo, a production code for nuclear reactor modeling. We parallelize the Koch-Baker-Alcouffe (KBA) parallel-wavefront sweep algorithm in the main kernel of Minisweep (the miniapplication) using CUDA, OpenMP and OpenACC. Our OpenACC implementation running on NVIDIA's next-generation Volta GPU boasts an 85.06x speedup over serial code, which is larger than CUDA's 83.72x speedup over the same serial implementation. Our experimental platform includes SummitDev, an ORNL representative architecture of the upcoming Summit supercomputer. Our parallelization effort across platforms also motivated us to define an abstract parallelism model that is architecture independent, with a goal of creating software abstractions that can be used by applications employing the wavefront sweep motif.

查看原文本刊更多论文

使波前算法适应未来架构的抽象和指令

体系结构正在迅速发展，百亿亿级机器有望提供数十亿路的并发性。我们需要在其他组件中重新思考算法、语言和编程模型，以便迁移大规模应用程序并探索这些机器上的并行性。尽管基于指令的编程模型允许程序员较少地担心编程而更多地关注科学，但是在这些模型中表达复杂的并行模式可能是一项艰巨的任务，特别是当目标是匹配硬件平台可以提供的性能时。其中一种模式是波前。本文对核反应堆建模生产代码Denovo中基于波前的微型应用程序进行了广泛的研究。我们使用CUDA、OpenMP和OpenACC在miniweep主内核中并行化Koch-Baker-Alcouffe (KBA)并行波前扫描算法。我们在NVIDIA下一代Volta GPU上运行的OpenACC实现比串行代码具有85.06倍的加速，这比CUDA在相同串行实现上的83.72倍加速要大。我们的实验平台包括SummitDev，这是ORNL即将推出的Summit超级计算机的代表架构。我们跨平台的并行化工作也促使我们定义了一个与体系结构无关的抽象并行化模型，其目标是创建软件抽象，可以被采用波前扫描主题的应用程序使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Platform for Advanced Scientific Computing Conference

自引率

0.00%

发文量