使用CUDA C加速隐式海洋模型

IF 4.4 2区工程技术 Q1 ENGINEERING, OCEAN

Applied Ocean Research Pub Date : 2025-08-25 DOI:10.1016/j.apor.2025.104740

Jianbin Xie , Xingru Feng , Tianhai Gao , Changming Dong , Baoshu Yin , Changmao Wu

{"title":"使用CUDA C加速隐式海洋模型","authors":"Jianbin Xie , Xingru Feng , Tianhai Gao , Changming Dong , Baoshu Yin , Changmao Wu","doi":"10.1016/j.apor.2025.104740","DOIUrl":null,"url":null,"abstract":"<div><div>In this study, we developed an ocean model named GPU-IOCASM (GPU-Implicit Ocean Current and Storm Surge Model), which employs the finite difference method with implicit iteration to ensure simulation stability. Additionally, it incorporates an online nesting for multi-layer computational grids, allowing localized grid refinement in critical regions to enhance simulation accuracy. To maximize GPU parallelism and minimize memory overhead, we optimized the residual update algorithm, applied a mask-based conditional computation method, and designed an adaptive iteration count prediction strategy. When the simulation reaches a designated output time, relevant variables are copied from GPU memory to host memory, while the GPU proceeds with the next computation without waiting for the I/O operation to complete. This process is designed to run asynchronously in most cases, ensuring that data transfer and CPU-side operations do not interfere with GPU-based computation. Verification results demonstrate that GPU-IOCASM's simulation results exhibit strong agreement with both observed data and SCHISM’s results, confirming its reliability and precision. Furthermore, GPU-IOCASM achieves a remarkable speedup of over 312 times compared with traditional CPU-based approaches. Unlike traditional GPU acceleration methods that require frequent data transfers between the CPU and GPU, GPU-IOCASM is designed to perform as much computation as possible on the GPU, thereby minimizing data transfer overhead and improving computational efficiency.</div></div>","PeriodicalId":8261,"journal":{"name":"Applied Ocean Research","volume":"163 ","pages":"Article 104740"},"PeriodicalIF":4.4000,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accelerating an implicit ocean model using CUDA C\",\"authors\":\"Jianbin Xie , Xingru Feng , Tianhai Gao , Changming Dong , Baoshu Yin , Changmao Wu\",\"doi\":\"10.1016/j.apor.2025.104740\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In this study, we developed an ocean model named GPU-IOCASM (GPU-Implicit Ocean Current and Storm Surge Model), which employs the finite difference method with implicit iteration to ensure simulation stability. Additionally, it incorporates an online nesting for multi-layer computational grids, allowing localized grid refinement in critical regions to enhance simulation accuracy. To maximize GPU parallelism and minimize memory overhead, we optimized the residual update algorithm, applied a mask-based conditional computation method, and designed an adaptive iteration count prediction strategy. When the simulation reaches a designated output time, relevant variables are copied from GPU memory to host memory, while the GPU proceeds with the next computation without waiting for the I/O operation to complete. This process is designed to run asynchronously in most cases, ensuring that data transfer and CPU-side operations do not interfere with GPU-based computation. Verification results demonstrate that GPU-IOCASM's simulation results exhibit strong agreement with both observed data and SCHISM’s results, confirming its reliability and precision. Furthermore, GPU-IOCASM achieves a remarkable speedup of over 312 times compared with traditional CPU-based approaches. Unlike traditional GPU acceleration methods that require frequent data transfers between the CPU and GPU, GPU-IOCASM is designed to perform as much computation as possible on the GPU, thereby minimizing data transfer overhead and improving computational efficiency.</div></div>\",\"PeriodicalId\":8261,\"journal\":{\"name\":\"Applied Ocean Research\",\"volume\":\"163 \",\"pages\":\"Article 104740\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Ocean Research\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0141118725003268\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, OCEAN\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Ocean Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141118725003268","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, OCEAN","Score":null,"Total":0}

引用次数: 0

摘要

在本研究中，我们开发了一个名为GPU-IOCASM （gpu -隐式洋流和风暴潮模型）的海洋模型，该模型采用隐式迭代的有限差分方法来保证模拟的稳定性。此外，它还结合了多层计算网格的在线嵌套，允许在关键区域进行局部网格细化，以提高模拟精度。为了最大化GPU并行性和最小化内存开销，我们优化了残差更新算法，采用基于掩码的条件计算方法，设计了自适应迭代计数预测策略。当模拟达到指定的输出时间时，相关变量从GPU内存复制到主机内存，而GPU无需等待I/O操作完成即可进行下一个计算。这个进程被设计成在大多数情况下异步运行，以确保数据传输和cpu端操作不会干扰基于gpu的计算。验证结果表明，GPU-IOCASM的仿真结果与实际观测数据和SCHISM的结果吻合较好，验证了其可靠性和精度。此外，与传统的基于cpu的方法相比，GPU-IOCASM实现了超过312倍的显着加速。与传统GPU加速方法需要在CPU和GPU之间频繁传输数据不同，GPU- iocasm旨在尽可能多地在GPU上执行计算，从而最小化数据传输开销并提高计算效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Accelerating an implicit ocean model using CUDA C

In this study, we developed an ocean model named GPU-IOCASM (GPU-Implicit Ocean Current and Storm Surge Model), which employs the finite difference method with implicit iteration to ensure simulation stability. Additionally, it incorporates an online nesting for multi-layer computational grids, allowing localized grid refinement in critical regions to enhance simulation accuracy. To maximize GPU parallelism and minimize memory overhead, we optimized the residual update algorithm, applied a mask-based conditional computation method, and designed an adaptive iteration count prediction strategy. When the simulation reaches a designated output time, relevant variables are copied from GPU memory to host memory, while the GPU proceeds with the next computation without waiting for the I/O operation to complete. This process is designed to run asynchronously in most cases, ensuring that data transfer and CPU-side operations do not interfere with GPU-based computation. Verification results demonstrate that GPU-IOCASM's simulation results exhibit strong agreement with both observed data and SCHISM’s results, confirming its reliability and precision. Furthermore, GPU-IOCASM achieves a remarkable speedup of over 312 times compared with traditional CPU-based approaches. Unlike traditional GPU acceleration methods that require frequent data transfers between the CPU and GPU, GPU-IOCASM is designed to perform as much computation as possible on the GPU, thereby minimizing data transfer overhead and improving computational efficiency.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Ocean Research 地学-工程：大洋

CiteScore

8.70

自引率

7.00%

发文量

316

审稿时长

59 days

期刊介绍： The aim of Applied Ocean Research is to encourage the submission of papers that advance the state of knowledge in a range of topics relevant to ocean engineering.