Accelerating an implicit ocean model using CUDA C

IF 4.4 2区 工程技术 Q1 ENGINEERING, OCEAN
Jianbin Xie , Xingru Feng , Tianhai Gao , Changming Dong , Baoshu Yin , Changmao Wu
{"title":"Accelerating an implicit ocean model using CUDA C","authors":"Jianbin Xie ,&nbsp;Xingru Feng ,&nbsp;Tianhai Gao ,&nbsp;Changming Dong ,&nbsp;Baoshu Yin ,&nbsp;Changmao Wu","doi":"10.1016/j.apor.2025.104740","DOIUrl":null,"url":null,"abstract":"<div><div>In this study, we developed an ocean model named GPU-IOCASM (GPU-Implicit Ocean Current and Storm Surge Model), which employs the finite difference method with implicit iteration to ensure simulation stability. Additionally, it incorporates an online nesting for multi-layer computational grids, allowing localized grid refinement in critical regions to enhance simulation accuracy. To maximize GPU parallelism and minimize memory overhead, we optimized the residual update algorithm, applied a mask-based conditional computation method, and designed an adaptive iteration count prediction strategy. When the simulation reaches a designated output time, relevant variables are copied from GPU memory to host memory, while the GPU proceeds with the next computation without waiting for the I/O operation to complete. This process is designed to run asynchronously in most cases, ensuring that data transfer and CPU-side operations do not interfere with GPU-based computation. Verification results demonstrate that GPU-IOCASM's simulation results exhibit strong agreement with both observed data and SCHISM’s results, confirming its reliability and precision. Furthermore, GPU-IOCASM achieves a remarkable speedup of over 312 times compared with traditional CPU-based approaches. Unlike traditional GPU acceleration methods that require frequent data transfers between the CPU and GPU, GPU-IOCASM is designed to perform as much computation as possible on the GPU, thereby minimizing data transfer overhead and improving computational efficiency.</div></div>","PeriodicalId":8261,"journal":{"name":"Applied Ocean Research","volume":"163 ","pages":"Article 104740"},"PeriodicalIF":4.4000,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Ocean Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141118725003268","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, OCEAN","Score":null,"Total":0}
引用次数: 0

Abstract

In this study, we developed an ocean model named GPU-IOCASM (GPU-Implicit Ocean Current and Storm Surge Model), which employs the finite difference method with implicit iteration to ensure simulation stability. Additionally, it incorporates an online nesting for multi-layer computational grids, allowing localized grid refinement in critical regions to enhance simulation accuracy. To maximize GPU parallelism and minimize memory overhead, we optimized the residual update algorithm, applied a mask-based conditional computation method, and designed an adaptive iteration count prediction strategy. When the simulation reaches a designated output time, relevant variables are copied from GPU memory to host memory, while the GPU proceeds with the next computation without waiting for the I/O operation to complete. This process is designed to run asynchronously in most cases, ensuring that data transfer and CPU-side operations do not interfere with GPU-based computation. Verification results demonstrate that GPU-IOCASM's simulation results exhibit strong agreement with both observed data and SCHISM’s results, confirming its reliability and precision. Furthermore, GPU-IOCASM achieves a remarkable speedup of over 312 times compared with traditional CPU-based approaches. Unlike traditional GPU acceleration methods that require frequent data transfers between the CPU and GPU, GPU-IOCASM is designed to perform as much computation as possible on the GPU, thereby minimizing data transfer overhead and improving computational efficiency.
使用CUDA C加速隐式海洋模型
在本研究中,我们开发了一个名为GPU-IOCASM (gpu -隐式洋流和风暴潮模型)的海洋模型,该模型采用隐式迭代的有限差分方法来保证模拟的稳定性。此外,它还结合了多层计算网格的在线嵌套,允许在关键区域进行局部网格细化,以提高模拟精度。为了最大化GPU并行性和最小化内存开销,我们优化了残差更新算法,采用基于掩码的条件计算方法,设计了自适应迭代计数预测策略。当模拟达到指定的输出时间时,相关变量从GPU内存复制到主机内存,而GPU无需等待I/O操作完成即可进行下一个计算。这个进程被设计成在大多数情况下异步运行,以确保数据传输和cpu端操作不会干扰基于gpu的计算。验证结果表明,GPU-IOCASM的仿真结果与实际观测数据和SCHISM的结果吻合较好,验证了其可靠性和精度。此外,与传统的基于cpu的方法相比,GPU-IOCASM实现了超过312倍的显着加速。与传统GPU加速方法需要在CPU和GPU之间频繁传输数据不同,GPU- iocasm旨在尽可能多地在GPU上执行计算,从而最小化数据传输开销并提高计算效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Applied Ocean Research
Applied Ocean Research 地学-工程:大洋
CiteScore
8.70
自引率
7.00%
发文量
316
审稿时长
59 days
期刊介绍: The aim of Applied Ocean Research is to encourage the submission of papers that advance the state of knowledge in a range of topics relevant to ocean engineering.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信