将3D地震建模代码(SW4)移植到CORAL机器上

IF 1.3 4区计算机科学 Q1 Computer Science

IBM Journal of Research and Development Pub Date : 2019-12-17 DOI:10.1147/JRD.2019.2960218

R. Pankajakshan;P.-H. Lin;B. Sjögreen

{"title":"将3D地震建模代码(SW4)移植到CORAL机器上","authors":"R. Pankajakshan;P.-H. Lin;B. Sjögreen","doi":"10.1147/JRD.2019.2960218","DOIUrl":null,"url":null,"abstract":"Seismic waves fourth order (SW4) solves the seismic wave equations on Cartesian and curvilinear grids using large compute clusters with O (100,000) cores. This article discusses the porting of SW4 to run on the CORAL architecture using the RAJA performance portability abstraction layer. The performances of key kernels using RAJA and CUDA are compared to estimate the performance penalty of using the portability abstraction layer. Code changes required for efficiency on GPUs and minimizing time spent in Message Passing Interface (MPI) are discussed. This article describes a path for efficiently porting large code bases to GPU-based machines while avoiding the pitfalls of a new architecture in the early stages of its deployment. Current bottlenecks in the code are discussed along with possible architectural or software mitigations. SW4 runs 28× faster on one 4-GPU CORAL node than on a CTS-1 node (Dual Intel Xeon E5-2695 v4). SW4 is now in routine use on problems of unprecedented resolution (203 billion grid points) and scale on 1,200 nodes of Summit.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"64 3/4","pages":"17:1-17:11"},"PeriodicalIF":1.3000,"publicationDate":"2019-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2960218","citationCount":"2","resultStr":"{\"title\":\"Porting a 3D seismic modeling code (SW4) to CORAL machines\",\"authors\":\"R. Pankajakshan;P.-H. Lin;B. Sjögreen\",\"doi\":\"10.1147/JRD.2019.2960218\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Seismic waves fourth order (SW4) solves the seismic wave equations on Cartesian and curvilinear grids using large compute clusters with O (100,000) cores. This article discusses the porting of SW4 to run on the CORAL architecture using the RAJA performance portability abstraction layer. The performances of key kernels using RAJA and CUDA are compared to estimate the performance penalty of using the portability abstraction layer. Code changes required for efficiency on GPUs and minimizing time spent in Message Passing Interface (MPI) are discussed. This article describes a path for efficiently porting large code bases to GPU-based machines while avoiding the pitfalls of a new architecture in the early stages of its deployment. Current bottlenecks in the code are discussed along with possible architectural or software mitigations. SW4 runs 28× faster on one 4-GPU CORAL node than on a CTS-1 node (Dual Intel Xeon E5-2695 v4). SW4 is now in routine use on problems of unprecedented resolution (203 billion grid points) and scale on 1,200 nodes of Summit.\",\"PeriodicalId\":55034,\"journal\":{\"name\":\"IBM Journal of Research and Development\",\"volume\":\"64 3/4\",\"pages\":\"17:1-17:11\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2019-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1147/JRD.2019.2960218\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IBM Journal of Research and Development\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/8935353/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IBM Journal of Research and Development","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/8935353/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 2

摘要

四阶地震波（SW4）使用具有O（100000）核的大型计算集群在笛卡尔和曲线网格上求解地震波方程。本文讨论了使用RAJA性能可移植性抽象层将SW4移植到CORAL架构上运行。比较了使用RAJA和CUDA的关键内核的性能，以估计使用可移植性抽象层的性能损失。讨论了提高GPU效率和最小化在消息传递接口（MPI）中花费的时间所需的代码更改。本文描述了一种有效地将大型代码库移植到基于GPU的机器的路径，同时避免了新架构在部署早期阶段的陷阱。讨论了代码中的当前瓶颈以及可能的体系结构或软件缓解措施。SW4在一个4-GPU CORAL节点上的运行速度是CTS-1节点（双Intel Xeon E5-2695 v4）的28倍。SW4目前正在Summit的1200个节点上以前所未有的分辨率（2030亿个网格点）和规模进行常规使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Porting a 3D seismic modeling code (SW4) to CORAL machines

Seismic waves fourth order (SW4) solves the seismic wave equations on Cartesian and curvilinear grids using large compute clusters with O (100,000) cores. This article discusses the porting of SW4 to run on the CORAL architecture using the RAJA performance portability abstraction layer. The performances of key kernels using RAJA and CUDA are compared to estimate the performance penalty of using the portability abstraction layer. Code changes required for efficiency on GPUs and minimizing time spent in Message Passing Interface (MPI) are discussed. This article describes a path for efficiently porting large code bases to GPU-based machines while avoiding the pitfalls of a new architecture in the early stages of its deployment. Current bottlenecks in the code are discussed along with possible architectural or software mitigations. SW4 runs 28× faster on one 4-GPU CORAL node than on a CTS-1 node (Dual Intel Xeon E5-2695 v4). SW4 is now in routine use on problems of unprecedented resolution (203 billion grid points) and scale on 1,200 nodes of Summit.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IBM Journal of Research and Development 工程技术-计算机：硬件

自引率

0.00%

发文量

审稿时长

6-12 weeks

期刊介绍： The IBM Journal of Research and Development is a peer-reviewed technical journal, published bimonthly, which features the work of authors in the science, technology and engineering of information systems. Papers are written for the worldwide scientific research and development community and knowledgeable professionals. Submitted papers are welcome from the IBM technical community and from non-IBM authors on topics relevant to the scientific and technical content of the Journal.