使用 MPI/CUDA 和 FFTW 软件包并行计算二维热方程

IF 2.4 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Frontiers in Computer Science Pub Date : 2024-01-11 DOI:10.3389/fcomp.2023.1305800

Tarik Chakkour

{"title":"使用 MPI/CUDA 和 FFTW 软件包并行计算二维热方程","authors":"Tarik Chakkour","doi":"10.3389/fcomp.2023.1305800","DOIUrl":null,"url":null,"abstract":"In this study, we present a fast algorithm for the numerical solution of the heat equation. The heat equation models the heat diffusion over time and through a given region. We engage a finite difference method to solve this equation numerically. The performance of its parallel implementation is considered using Message Passing Interface (MPI), Compute Unified Device Architecture (CUDA), and time schemes, such as Forward Euler (FE) and Runge-Kutta (RK) methods. The originality of this study is research on parallel implementations of the fourth-order Runge-Kutta method (RK4) for sparse matrices on Graphics Processing Unit (GPU) architecture. The supreme proprietary framework for GPU computing is CUDA, provided by NVIDIA. We will show three metrics through this parallelization to compare the computing performance: time-to-solution, speed-up, and performance. The spectral method is investigated by utilizing the FFTW software library, based on the computation of the fast Fourier transforms (FFT) in parallel and distributed memory architectures. Our CUDA-based FFT, named CUFFT, is performed in platforms, which is a highly optimized FFTW implementation. We will give numerical tests to reveal that this method is up-and-coming for solving the heat equation. The final result demonstrates that CUDA has a significant advantage and performance since the computational cost is tiny compared with the MPI implementation. This vital performance gain is also achieved through careful attention of managing memory communication and access.","PeriodicalId":52823,"journal":{"name":"Frontiers in Computer Science","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Parallel computation to bidimensional heat equation using MPI/CUDA and FFTW package\",\"authors\":\"Tarik Chakkour\",\"doi\":\"10.3389/fcomp.2023.1305800\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this study, we present a fast algorithm for the numerical solution of the heat equation. The heat equation models the heat diffusion over time and through a given region. We engage a finite difference method to solve this equation numerically. The performance of its parallel implementation is considered using Message Passing Interface (MPI), Compute Unified Device Architecture (CUDA), and time schemes, such as Forward Euler (FE) and Runge-Kutta (RK) methods. The originality of this study is research on parallel implementations of the fourth-order Runge-Kutta method (RK4) for sparse matrices on Graphics Processing Unit (GPU) architecture. The supreme proprietary framework for GPU computing is CUDA, provided by NVIDIA. We will show three metrics through this parallelization to compare the computing performance: time-to-solution, speed-up, and performance. The spectral method is investigated by utilizing the FFTW software library, based on the computation of the fast Fourier transforms (FFT) in parallel and distributed memory architectures. Our CUDA-based FFT, named CUFFT, is performed in platforms, which is a highly optimized FFTW implementation. We will give numerical tests to reveal that this method is up-and-coming for solving the heat equation. The final result demonstrates that CUDA has a significant advantage and performance since the computational cost is tiny compared with the MPI implementation. This vital performance gain is also achieved through careful attention of managing memory communication and access.\",\"PeriodicalId\":52823,\"journal\":{\"name\":\"Frontiers in Computer Science\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-01-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fcomp.2023.1305800\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fcomp.2023.1305800","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

在本研究中，我们提出了一种热方程数值求解的快速算法。热方程模拟热量随时间和通过给定区域的扩散。我们采用有限差分法对该方程进行数值求解。我们使用消息传递接口 (MPI)、计算统一设备架构 (CUDA) 和时间方案（如前向欧拉 (FE) 和 Runge-Kutta (RK) 方法）考虑了其并行实施的性能。本研究的独创性在于研究在图形处理器（GPU）架构上并行实施稀疏矩阵的四阶 Runge-Kutta 方法（RK4）。GPU 计算的最高专有框架是英伟达公司提供的 CUDA。我们将通过并行化展示三个指标来比较计算性能：求解时间、速度提升和性能。我们利用基于并行和分布式内存架构计算快速傅立叶变换（FFT）的 FFTW 软件库研究了频谱方法。我们基于 CUDA 的 FFT（名为 CUFFT）是在平台中执行的，它是高度优化的 FFTW 实现。我们将通过数值测试来揭示这种方法在求解热方程方面的最新进展。最终结果表明，CUDA 具有显著的优势和性能，因为其计算成本与 MPI 实现相比微乎其微。这一重要的性能提升还得益于对内存通信和访问的精心管理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Parallel computation to bidimensional heat equation using MPI/CUDA and FFTW package

In this study, we present a fast algorithm for the numerical solution of the heat equation. The heat equation models the heat diffusion over time and through a given region. We engage a finite difference method to solve this equation numerically. The performance of its parallel implementation is considered using Message Passing Interface (MPI), Compute Unified Device Architecture (CUDA), and time schemes, such as Forward Euler (FE) and Runge-Kutta (RK) methods. The originality of this study is research on parallel implementations of the fourth-order Runge-Kutta method (RK4) for sparse matrices on Graphics Processing Unit (GPU) architecture. The supreme proprietary framework for GPU computing is CUDA, provided by NVIDIA. We will show three metrics through this parallelization to compare the computing performance: time-to-solution, speed-up, and performance. The spectral method is investigated by utilizing the FFTW software library, based on the computation of the fast Fourier transforms (FFT) in parallel and distributed memory architectures. Our CUDA-based FFT, named CUFFT, is performed in platforms, which is a highly optimized FFTW implementation. We will give numerical tests to reveal that this method is up-and-coming for solving the heat equation. The final result demonstrates that CUDA has a significant advantage and performance since the computational cost is tiny compared with the MPI implementation. This vital performance gain is also achieved through careful attention of managing memory communication and access.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in Computer Science COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-

CiteScore

4.30

自引率

0.00%

发文量

152

审稿时长

13 weeks