{"title":"使用CUDA C加速隐式海洋模型","authors":"Jianbin Xie , Xingru Feng , Tianhai Gao , Changming Dong , Baoshu Yin , Changmao Wu","doi":"10.1016/j.apor.2025.104740","DOIUrl":null,"url":null,"abstract":"<div><div>In this study, we developed an ocean model named GPU-IOCASM (GPU-Implicit Ocean Current and Storm Surge Model), which employs the finite difference method with implicit iteration to ensure simulation stability. Additionally, it incorporates an online nesting for multi-layer computational grids, allowing localized grid refinement in critical regions to enhance simulation accuracy. To maximize GPU parallelism and minimize memory overhead, we optimized the residual update algorithm, applied a mask-based conditional computation method, and designed an adaptive iteration count prediction strategy. When the simulation reaches a designated output time, relevant variables are copied from GPU memory to host memory, while the GPU proceeds with the next computation without waiting for the I/O operation to complete. This process is designed to run asynchronously in most cases, ensuring that data transfer and CPU-side operations do not interfere with GPU-based computation. Verification results demonstrate that GPU-IOCASM's simulation results exhibit strong agreement with both observed data and SCHISM’s results, confirming its reliability and precision. Furthermore, GPU-IOCASM achieves a remarkable speedup of over 312 times compared with traditional CPU-based approaches. Unlike traditional GPU acceleration methods that require frequent data transfers between the CPU and GPU, GPU-IOCASM is designed to perform as much computation as possible on the GPU, thereby minimizing data transfer overhead and improving computational efficiency.</div></div>","PeriodicalId":8261,"journal":{"name":"Applied Ocean Research","volume":"163 ","pages":"Article 104740"},"PeriodicalIF":4.4000,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accelerating an implicit ocean model using CUDA C\",\"authors\":\"Jianbin Xie , Xingru Feng , Tianhai Gao , Changming Dong , Baoshu Yin , Changmao Wu\",\"doi\":\"10.1016/j.apor.2025.104740\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In this study, we developed an ocean model named GPU-IOCASM (GPU-Implicit Ocean Current and Storm Surge Model), which employs the finite difference method with implicit iteration to ensure simulation stability. Additionally, it incorporates an online nesting for multi-layer computational grids, allowing localized grid refinement in critical regions to enhance simulation accuracy. To maximize GPU parallelism and minimize memory overhead, we optimized the residual update algorithm, applied a mask-based conditional computation method, and designed an adaptive iteration count prediction strategy. When the simulation reaches a designated output time, relevant variables are copied from GPU memory to host memory, while the GPU proceeds with the next computation without waiting for the I/O operation to complete. This process is designed to run asynchronously in most cases, ensuring that data transfer and CPU-side operations do not interfere with GPU-based computation. Verification results demonstrate that GPU-IOCASM's simulation results exhibit strong agreement with both observed data and SCHISM’s results, confirming its reliability and precision. Furthermore, GPU-IOCASM achieves a remarkable speedup of over 312 times compared with traditional CPU-based approaches. Unlike traditional GPU acceleration methods that require frequent data transfers between the CPU and GPU, GPU-IOCASM is designed to perform as much computation as possible on the GPU, thereby minimizing data transfer overhead and improving computational efficiency.</div></div>\",\"PeriodicalId\":8261,\"journal\":{\"name\":\"Applied Ocean Research\",\"volume\":\"163 \",\"pages\":\"Article 104740\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Ocean Research\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0141118725003268\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, OCEAN\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Ocean Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141118725003268","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, OCEAN","Score":null,"Total":0}
In this study, we developed an ocean model named GPU-IOCASM (GPU-Implicit Ocean Current and Storm Surge Model), which employs the finite difference method with implicit iteration to ensure simulation stability. Additionally, it incorporates an online nesting for multi-layer computational grids, allowing localized grid refinement in critical regions to enhance simulation accuracy. To maximize GPU parallelism and minimize memory overhead, we optimized the residual update algorithm, applied a mask-based conditional computation method, and designed an adaptive iteration count prediction strategy. When the simulation reaches a designated output time, relevant variables are copied from GPU memory to host memory, while the GPU proceeds with the next computation without waiting for the I/O operation to complete. This process is designed to run asynchronously in most cases, ensuring that data transfer and CPU-side operations do not interfere with GPU-based computation. Verification results demonstrate that GPU-IOCASM's simulation results exhibit strong agreement with both observed data and SCHISM’s results, confirming its reliability and precision. Furthermore, GPU-IOCASM achieves a remarkable speedup of over 312 times compared with traditional CPU-based approaches. Unlike traditional GPU acceleration methods that require frequent data transfers between the CPU and GPU, GPU-IOCASM is designed to perform as much computation as possible on the GPU, thereby minimizing data transfer overhead and improving computational efficiency.
期刊介绍:
The aim of Applied Ocean Research is to encourage the submission of papers that advance the state of knowledge in a range of topics relevant to ocean engineering.