{"title":"GPU系统上Stokes问题的多网格方法","authors":"Cu Cui, Guido Kanschat","doi":"10.1016/j.compfluid.2025.106703","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents a matrix-free multigrid method for solving the Stokes problem, discretized using <span><math><msup><mrow><mi>H</mi></mrow><mrow><mtext>div</mtext></mrow></msup></math></span>-conforming discontinuous Galerkin methods. Our method operates directly on both the velocity and pressure spaces, eliminating the need for a global Schur complement approximation. We employ a multiplicative Schwarz smoother with vertex-patch subdomains and the Schur complement method combined with the fast diagonalization for the efficient evaluation of the local solvers. By leveraging the tensor product structure of Raviart–Thomas elements and an optimized, conflict-free shared memory access pattern, the matrix-free operator evaluation demonstrates excellent performance, reaching over one billion degrees of freedom per second on a single NVIDIA A100 GPU. Numerical results indicate efficiency comparable to that of the three-dimensional Poisson problem.</div></div>","PeriodicalId":287,"journal":{"name":"Computers & Fluids","volume":"299 ","pages":"Article 106703"},"PeriodicalIF":2.5000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multigrid methods for the Stokes problem on GPU systems\",\"authors\":\"Cu Cui, Guido Kanschat\",\"doi\":\"10.1016/j.compfluid.2025.106703\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper presents a matrix-free multigrid method for solving the Stokes problem, discretized using <span><math><msup><mrow><mi>H</mi></mrow><mrow><mtext>div</mtext></mrow></msup></math></span>-conforming discontinuous Galerkin methods. Our method operates directly on both the velocity and pressure spaces, eliminating the need for a global Schur complement approximation. We employ a multiplicative Schwarz smoother with vertex-patch subdomains and the Schur complement method combined with the fast diagonalization for the efficient evaluation of the local solvers. By leveraging the tensor product structure of Raviart–Thomas elements and an optimized, conflict-free shared memory access pattern, the matrix-free operator evaluation demonstrates excellent performance, reaching over one billion degrees of freedom per second on a single NVIDIA A100 GPU. Numerical results indicate efficiency comparable to that of the three-dimensional Poisson problem.</div></div>\",\"PeriodicalId\":287,\"journal\":{\"name\":\"Computers & Fluids\",\"volume\":\"299 \",\"pages\":\"Article 106703\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Fluids\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S004579302500163X\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Fluids","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S004579302500163X","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Multigrid methods for the Stokes problem on GPU systems
This paper presents a matrix-free multigrid method for solving the Stokes problem, discretized using -conforming discontinuous Galerkin methods. Our method operates directly on both the velocity and pressure spaces, eliminating the need for a global Schur complement approximation. We employ a multiplicative Schwarz smoother with vertex-patch subdomains and the Schur complement method combined with the fast diagonalization for the efficient evaluation of the local solvers. By leveraging the tensor product structure of Raviart–Thomas elements and an optimized, conflict-free shared memory access pattern, the matrix-free operator evaluation demonstrates excellent performance, reaching over one billion degrees of freedom per second on a single NVIDIA A100 GPU. Numerical results indicate efficiency comparable to that of the three-dimensional Poisson problem.
期刊介绍:
Computers & Fluids is multidisciplinary. The term ''fluid'' is interpreted in the broadest sense. Hydro- and aerodynamics, high-speed and physical gas dynamics, turbulence and flow stability, multiphase flow, rheology, tribology and fluid-structure interaction are all of interest, provided that computer technique plays a significant role in the associated studies or design methodology.