使用CUDA C加速隐式海洋模型

IF 4.4 2区 工程技术 Q1 ENGINEERING, OCEAN
Jianbin Xie , Xingru Feng , Tianhai Gao , Changming Dong , Baoshu Yin , Changmao Wu
{"title":"使用CUDA C加速隐式海洋模型","authors":"Jianbin Xie ,&nbsp;Xingru Feng ,&nbsp;Tianhai Gao ,&nbsp;Changming Dong ,&nbsp;Baoshu Yin ,&nbsp;Changmao Wu","doi":"10.1016/j.apor.2025.104740","DOIUrl":null,"url":null,"abstract":"<div><div>In this study, we developed an ocean model named GPU-IOCASM (GPU-Implicit Ocean Current and Storm Surge Model), which employs the finite difference method with implicit iteration to ensure simulation stability. Additionally, it incorporates an online nesting for multi-layer computational grids, allowing localized grid refinement in critical regions to enhance simulation accuracy. To maximize GPU parallelism and minimize memory overhead, we optimized the residual update algorithm, applied a mask-based conditional computation method, and designed an adaptive iteration count prediction strategy. When the simulation reaches a designated output time, relevant variables are copied from GPU memory to host memory, while the GPU proceeds with the next computation without waiting for the I/O operation to complete. This process is designed to run asynchronously in most cases, ensuring that data transfer and CPU-side operations do not interfere with GPU-based computation. Verification results demonstrate that GPU-IOCASM's simulation results exhibit strong agreement with both observed data and SCHISM’s results, confirming its reliability and precision. Furthermore, GPU-IOCASM achieves a remarkable speedup of over 312 times compared with traditional CPU-based approaches. Unlike traditional GPU acceleration methods that require frequent data transfers between the CPU and GPU, GPU-IOCASM is designed to perform as much computation as possible on the GPU, thereby minimizing data transfer overhead and improving computational efficiency.</div></div>","PeriodicalId":8261,"journal":{"name":"Applied Ocean Research","volume":"163 ","pages":"Article 104740"},"PeriodicalIF":4.4000,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accelerating an implicit ocean model using CUDA C\",\"authors\":\"Jianbin Xie ,&nbsp;Xingru Feng ,&nbsp;Tianhai Gao ,&nbsp;Changming Dong ,&nbsp;Baoshu Yin ,&nbsp;Changmao Wu\",\"doi\":\"10.1016/j.apor.2025.104740\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In this study, we developed an ocean model named GPU-IOCASM (GPU-Implicit Ocean Current and Storm Surge Model), which employs the finite difference method with implicit iteration to ensure simulation stability. Additionally, it incorporates an online nesting for multi-layer computational grids, allowing localized grid refinement in critical regions to enhance simulation accuracy. To maximize GPU parallelism and minimize memory overhead, we optimized the residual update algorithm, applied a mask-based conditional computation method, and designed an adaptive iteration count prediction strategy. When the simulation reaches a designated output time, relevant variables are copied from GPU memory to host memory, while the GPU proceeds with the next computation without waiting for the I/O operation to complete. This process is designed to run asynchronously in most cases, ensuring that data transfer and CPU-side operations do not interfere with GPU-based computation. Verification results demonstrate that GPU-IOCASM's simulation results exhibit strong agreement with both observed data and SCHISM’s results, confirming its reliability and precision. Furthermore, GPU-IOCASM achieves a remarkable speedup of over 312 times compared with traditional CPU-based approaches. Unlike traditional GPU acceleration methods that require frequent data transfers between the CPU and GPU, GPU-IOCASM is designed to perform as much computation as possible on the GPU, thereby minimizing data transfer overhead and improving computational efficiency.</div></div>\",\"PeriodicalId\":8261,\"journal\":{\"name\":\"Applied Ocean Research\",\"volume\":\"163 \",\"pages\":\"Article 104740\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Ocean Research\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0141118725003268\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, OCEAN\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Ocean Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141118725003268","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, OCEAN","Score":null,"Total":0}
引用次数: 0

摘要

在本研究中,我们开发了一个名为GPU-IOCASM (gpu -隐式洋流和风暴潮模型)的海洋模型,该模型采用隐式迭代的有限差分方法来保证模拟的稳定性。此外,它还结合了多层计算网格的在线嵌套,允许在关键区域进行局部网格细化,以提高模拟精度。为了最大化GPU并行性和最小化内存开销,我们优化了残差更新算法,采用基于掩码的条件计算方法,设计了自适应迭代计数预测策略。当模拟达到指定的输出时间时,相关变量从GPU内存复制到主机内存,而GPU无需等待I/O操作完成即可进行下一个计算。这个进程被设计成在大多数情况下异步运行,以确保数据传输和cpu端操作不会干扰基于gpu的计算。验证结果表明,GPU-IOCASM的仿真结果与实际观测数据和SCHISM的结果吻合较好,验证了其可靠性和精度。此外,与传统的基于cpu的方法相比,GPU-IOCASM实现了超过312倍的显着加速。与传统GPU加速方法需要在CPU和GPU之间频繁传输数据不同,GPU- iocasm旨在尽可能多地在GPU上执行计算,从而最小化数据传输开销并提高计算效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Accelerating an implicit ocean model using CUDA C
In this study, we developed an ocean model named GPU-IOCASM (GPU-Implicit Ocean Current and Storm Surge Model), which employs the finite difference method with implicit iteration to ensure simulation stability. Additionally, it incorporates an online nesting for multi-layer computational grids, allowing localized grid refinement in critical regions to enhance simulation accuracy. To maximize GPU parallelism and minimize memory overhead, we optimized the residual update algorithm, applied a mask-based conditional computation method, and designed an adaptive iteration count prediction strategy. When the simulation reaches a designated output time, relevant variables are copied from GPU memory to host memory, while the GPU proceeds with the next computation without waiting for the I/O operation to complete. This process is designed to run asynchronously in most cases, ensuring that data transfer and CPU-side operations do not interfere with GPU-based computation. Verification results demonstrate that GPU-IOCASM's simulation results exhibit strong agreement with both observed data and SCHISM’s results, confirming its reliability and precision. Furthermore, GPU-IOCASM achieves a remarkable speedup of over 312 times compared with traditional CPU-based approaches. Unlike traditional GPU acceleration methods that require frequent data transfers between the CPU and GPU, GPU-IOCASM is designed to perform as much computation as possible on the GPU, thereby minimizing data transfer overhead and improving computational efficiency.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Ocean Research
Applied Ocean Research 地学-工程:大洋
CiteScore
8.70
自引率
7.00%
发文量
316
审稿时长
59 days
期刊介绍: The aim of Applied Ocean Research is to encourage the submission of papers that advance the state of knowledge in a range of topics relevant to ocean engineering.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信