APPLE-MASNUM: Accelerating parallel processing for lightweight expansion of MASNUM on a single multi-GPU node

IF 3.1 3区 地球科学 Q2 METEOROLOGY & ATMOSPHERIC SCIENCES
Qi Lou , Changmao Wu , Changming Dong , Xingru Feng , Yuanyuan Xia , Li Liu , Zhengwei Xu , Xu Gao , Meng Sun , Xunqiang Yin
{"title":"APPLE-MASNUM: Accelerating parallel processing for lightweight expansion of MASNUM on a single multi-GPU node","authors":"Qi Lou ,&nbsp;Changmao Wu ,&nbsp;Changming Dong ,&nbsp;Xingru Feng ,&nbsp;Yuanyuan Xia ,&nbsp;Li Liu ,&nbsp;Zhengwei Xu ,&nbsp;Xu Gao ,&nbsp;Meng Sun ,&nbsp;Xunqiang Yin","doi":"10.1016/j.ocemod.2025.102557","DOIUrl":null,"url":null,"abstract":"<div><div>The Marine Science and Numerical Modeling (MASNUM) system, developed for oceanic wave forecasting, play an important role in marine disaster prevention and maritime activities. However, its application is hampered by the requirement of large computing resources. To overcome these barriers, we have implemented an accelerating parallel processing for lightweight expansion of MASNUM (APPLE-MASNUM) on a single compute node with multiple GPUs. In initiating our approach, the mathematical-physics equations of the MASNUM system are thoroughly analyzed to pinpoint the primary computational bottlenecks. This study then transforms MASNUM from a multi-process MPI program into a preliminary GPU-compatible algorithms. Subsequently, the paper proposes an optimization strategy for two-dimensional four-point stencil computations. Following this, an optimization method for overlapping computation with communication is introduced. Finally, a refined data layout scheme tailored for GPUs is designed and implemented. Three numerical experiments with five-day wave forecasts demonstrated that compared to single-core MASNUM, the acceleration ratios of the framework presented in this study are 49.29-fold, 62.58-fold, and 65.74-fold, respectively. This considerable performance boost highlights the efficiency of the lightweight APPLE-MASNUM framework introduced in this research. This signifies the first implementation and optimization of the MASNUM model on a GPU-based heterogeneous platform.</div></div>","PeriodicalId":19457,"journal":{"name":"Ocean Modelling","volume":"196 ","pages":"Article 102557"},"PeriodicalIF":3.1000,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ocean Modelling","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1463500325000605","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"METEOROLOGY & ATMOSPHERIC SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

The Marine Science and Numerical Modeling (MASNUM) system, developed for oceanic wave forecasting, play an important role in marine disaster prevention and maritime activities. However, its application is hampered by the requirement of large computing resources. To overcome these barriers, we have implemented an accelerating parallel processing for lightweight expansion of MASNUM (APPLE-MASNUM) on a single compute node with multiple GPUs. In initiating our approach, the mathematical-physics equations of the MASNUM system are thoroughly analyzed to pinpoint the primary computational bottlenecks. This study then transforms MASNUM from a multi-process MPI program into a preliminary GPU-compatible algorithms. Subsequently, the paper proposes an optimization strategy for two-dimensional four-point stencil computations. Following this, an optimization method for overlapping computation with communication is introduced. Finally, a refined data layout scheme tailored for GPUs is designed and implemented. Three numerical experiments with five-day wave forecasts demonstrated that compared to single-core MASNUM, the acceleration ratios of the framework presented in this study are 49.29-fold, 62.58-fold, and 65.74-fold, respectively. This considerable performance boost highlights the efficiency of the lightweight APPLE-MASNUM framework introduced in this research. This signifies the first implementation and optimization of the MASNUM model on a GPU-based heterogeneous platform.
APPLE-MASNUM:在单个多gpu节点上加速MASNUM轻量级扩展的并行处理
海洋科学与数值模拟(MASNUM)系统是为海浪预报而开发的,在海洋灾害预防和海洋活动中发挥着重要作用。然而,它的应用受到大量计算资源需求的阻碍。为了克服这些障碍,我们在具有多个gpu的单个计算节点上实现了MASNUM (APPLE-MASNUM)轻量级扩展的加速并行处理。在启动我们的方法时,对MASNUM系统的数学物理方程进行了彻底的分析,以确定主要的计算瓶颈。然后,本研究将MASNUM从一个多进程MPI程序转换为一个初步的gpu兼容算法。随后,提出了一种二维四点模板计算的优化策略。在此基础上,提出了一种带通信的重叠计算优化方法。最后,设计并实现了一种适合gpu的精细数据布局方案。3个5天波预报的数值实验表明,与单核MASNUM相比,本文提出的框架的加速比分别为49.29倍、62.58倍和65.74倍。这种相当大的性能提升突出了本研究中引入的轻量级APPLE-MASNUM框架的效率。这标志着MASNUM模型首次在基于gpu的异构平台上实现和优化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Ocean Modelling
Ocean Modelling 地学-海洋学
CiteScore
5.50
自引率
9.40%
发文量
86
审稿时长
19.6 weeks
期刊介绍: The main objective of Ocean Modelling is to provide rapid communication between those interested in ocean modelling, whether through direct observation, or through analytical, numerical or laboratory models, and including interactions between physical and biogeochemical or biological phenomena. Because of the intimate links between ocean and atmosphere, involvement of scientists interested in influences of either medium on the other is welcome. The journal has a wide scope and includes ocean-atmosphere interaction in various forms as well as pure ocean results. In addition to primary peer-reviewed papers, the journal provides review papers, preliminary communications, and discussions.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信