将3D地震建模代码(SW4)移植到CORAL机器上

IF 1.3 4区 计算机科学 Q1 Computer Science
R. Pankajakshan;P.-H. Lin;B. Sjögreen
{"title":"将3D地震建模代码(SW4)移植到CORAL机器上","authors":"R. Pankajakshan;P.-H. Lin;B. Sjögreen","doi":"10.1147/JRD.2019.2960218","DOIUrl":null,"url":null,"abstract":"Seismic waves fourth order (SW4) solves the seismic wave equations on Cartesian and curvilinear grids using large compute clusters with O (100,000) cores. This article discusses the porting of SW4 to run on the CORAL architecture using the RAJA performance portability abstraction layer. The performances of key kernels using RAJA and CUDA are compared to estimate the performance penalty of using the portability abstraction layer. Code changes required for efficiency on GPUs and minimizing time spent in Message Passing Interface (MPI) are discussed. This article describes a path for efficiently porting large code bases to GPU-based machines while avoiding the pitfalls of a new architecture in the early stages of its deployment. Current bottlenecks in the code are discussed along with possible architectural or software mitigations. SW4 runs 28× faster on one 4-GPU CORAL node than on a CTS-1 node (Dual Intel Xeon E5-2695 v4). SW4 is now in routine use on problems of unprecedented resolution (203 billion grid points) and scale on 1,200 nodes of Summit.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3000,"publicationDate":"2019-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2960218","citationCount":"2","resultStr":"{\"title\":\"Porting a 3D seismic modeling code (SW4) to CORAL machines\",\"authors\":\"R. Pankajakshan;P.-H. Lin;B. Sjögreen\",\"doi\":\"10.1147/JRD.2019.2960218\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Seismic waves fourth order (SW4) solves the seismic wave equations on Cartesian and curvilinear grids using large compute clusters with O (100,000) cores. This article discusses the porting of SW4 to run on the CORAL architecture using the RAJA performance portability abstraction layer. The performances of key kernels using RAJA and CUDA are compared to estimate the performance penalty of using the portability abstraction layer. Code changes required for efficiency on GPUs and minimizing time spent in Message Passing Interface (MPI) are discussed. This article describes a path for efficiently porting large code bases to GPU-based machines while avoiding the pitfalls of a new architecture in the early stages of its deployment. Current bottlenecks in the code are discussed along with possible architectural or software mitigations. SW4 runs 28× faster on one 4-GPU CORAL node than on a CTS-1 node (Dual Intel Xeon E5-2695 v4). SW4 is now in routine use on problems of unprecedented resolution (203 billion grid points) and scale on 1,200 nodes of Summit.\",\"PeriodicalId\":55034,\"journal\":{\"name\":\"IBM Journal of Research and Development\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2019-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1147/JRD.2019.2960218\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IBM Journal of Research and Development\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/8935353/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IBM Journal of Research and Development","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/8935353/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 2

摘要

四阶地震波(SW4)使用具有O(100000)核的大型计算集群在笛卡尔和曲线网格上求解地震波方程。本文讨论了使用RAJA性能可移植性抽象层将SW4移植到CORAL架构上运行。比较了使用RAJA和CUDA的关键内核的性能,以估计使用可移植性抽象层的性能损失。讨论了提高GPU效率和最小化在消息传递接口(MPI)中花费的时间所需的代码更改。本文描述了一种有效地将大型代码库移植到基于GPU的机器的路径,同时避免了新架构在部署早期阶段的陷阱。讨论了代码中的当前瓶颈以及可能的体系结构或软件缓解措施。SW4在一个4-GPU CORAL节点上的运行速度是CTS-1节点(双Intel Xeon E5-2695 v4)的28倍。SW4目前正在Summit的1200个节点上以前所未有的分辨率(2030亿个网格点)和规模进行常规使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Porting a 3D seismic modeling code (SW4) to CORAL machines
Seismic waves fourth order (SW4) solves the seismic wave equations on Cartesian and curvilinear grids using large compute clusters with O (100,000) cores. This article discusses the porting of SW4 to run on the CORAL architecture using the RAJA performance portability abstraction layer. The performances of key kernels using RAJA and CUDA are compared to estimate the performance penalty of using the portability abstraction layer. Code changes required for efficiency on GPUs and minimizing time spent in Message Passing Interface (MPI) are discussed. This article describes a path for efficiently porting large code bases to GPU-based machines while avoiding the pitfalls of a new architecture in the early stages of its deployment. Current bottlenecks in the code are discussed along with possible architectural or software mitigations. SW4 runs 28× faster on one 4-GPU CORAL node than on a CTS-1 node (Dual Intel Xeon E5-2695 v4). SW4 is now in routine use on problems of unprecedented resolution (203 billion grid points) and scale on 1,200 nodes of Summit.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IBM Journal of Research and Development
IBM Journal of Research and Development 工程技术-计算机:硬件
自引率
0.00%
发文量
0
审稿时长
6-12 weeks
期刊介绍: The IBM Journal of Research and Development is a peer-reviewed technical journal, published bimonthly, which features the work of authors in the science, technology and engineering of information systems. Papers are written for the worldwide scientific research and development community and knowledgeable professionals. Submitted papers are welcome from the IBM technical community and from non-IBM authors on topics relevant to the scientific and technical content of the Journal.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信