加速 Fortran 代码：将 Coarray Fortran 与 CUDA Fortran 和 OpenMP 集成的方法

IF 3.4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing Pub Date : 2024-09-06 DOI:10.1016/j.jpdc.2024.104977

James McKevitt , Eduard I. Vorobyov , Igor Kulikov

{"title":"加速 Fortran 代码：将 Coarray Fortran 与 CUDA Fortran 和 OpenMP 集成的方法","authors":"James McKevitt , Eduard I. Vorobyov , Igor Kulikov","doi":"10.1016/j.jpdc.2024.104977","DOIUrl":null,"url":null,"abstract":"<div><p>Fortran's prominence in scientific computing requires strategies to ensure both that legacy codes are efficient on high-performance computing systems, and that the language remains attractive for the development of new high-performance codes. Coarray Fortran (CAF), part of the Fortran 2008 standard introduced for parallel programming, facilitates distributed memory parallelism with a syntax familiar to Fortran programmers, simplifying the transition from single-processor to multi-processor coding. This research focuses on innovating and refining a parallel programming methodology that fuses the strengths of Intel Coarray Fortran, Nvidia CUDA Fortran, and OpenMP for distributed memory parallelism, high-speed GPU acceleration and shared memory parallelism respectively. We consider the management of pageable and pinned memory, CPU-GPU affinity in NUMA multiprocessors, and robust compiler interfacing with speed optimisation. We demonstrate our method through its application to a parallelised Poisson solver and compare the methodology, implementation, and scaling performance to that of the Message Passing Interface (MPI), finding CAF offers similar speeds with easier implementation. For new codes, this approach offers a faster route to optimised parallel computing. For legacy codes, it eases the transition to parallel computing, allowing their transformation into scalable, high-performance computing applications without the need for extensive re-design or additional syntax.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"195 ","pages":"Article 104977"},"PeriodicalIF":3.4000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524001412/pdfft?md5=69e1ea2ba9c62d46ed1506e701029846&pid=1-s2.0-S0743731524001412-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Accelerating Fortran codes: A method for integrating Coarray Fortran with CUDA Fortran and OpenMP\",\"authors\":\"James McKevitt , Eduard I. Vorobyov , Igor Kulikov\",\"doi\":\"10.1016/j.jpdc.2024.104977\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Fortran's prominence in scientific computing requires strategies to ensure both that legacy codes are efficient on high-performance computing systems, and that the language remains attractive for the development of new high-performance codes. Coarray Fortran (CAF), part of the Fortran 2008 standard introduced for parallel programming, facilitates distributed memory parallelism with a syntax familiar to Fortran programmers, simplifying the transition from single-processor to multi-processor coding. This research focuses on innovating and refining a parallel programming methodology that fuses the strengths of Intel Coarray Fortran, Nvidia CUDA Fortran, and OpenMP for distributed memory parallelism, high-speed GPU acceleration and shared memory parallelism respectively. We consider the management of pageable and pinned memory, CPU-GPU affinity in NUMA multiprocessors, and robust compiler interfacing with speed optimisation. We demonstrate our method through its application to a parallelised Poisson solver and compare the methodology, implementation, and scaling performance to that of the Message Passing Interface (MPI), finding CAF offers similar speeds with easier implementation. For new codes, this approach offers a faster route to optimised parallel computing. For legacy codes, it eases the transition to parallel computing, allowing their transformation into scalable, high-performance computing applications without the need for extensive re-design or additional syntax.</p></div>\",\"PeriodicalId\":54775,\"journal\":{\"name\":\"Journal of Parallel and Distributed Computing\",\"volume\":\"195 \",\"pages\":\"Article 104977\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0743731524001412/pdfft?md5=69e1ea2ba9c62d46ed1506e701029846&pid=1-s2.0-S0743731524001412-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Parallel and Distributed Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0743731524001412\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Parallel and Distributed Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0743731524001412","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

Fortran 在科学计算领域的突出地位要求我们采取策略，既要确保传统代码在高性能计算系统上的效率，又要确保该语言对开发新的高性能代码保持吸引力。Coarray Fortran（CAF）是为并行编程引入的 Fortran 2008 标准的一部分，它以 Fortran 程序员熟悉的语法促进了分布式内存并行性，简化了从单处理器到多处理器编码的过渡。本研究的重点是创新和完善一种并行编程方法，它融合了英特尔 Coarray Fortran、Nvidia CUDA Fortran 和 OpenMP 在分布式内存并行、高速 GPU 加速和共享内存并行方面的优势。我们考虑了可分页内存和针式内存的管理、NUMA 多核处理器中 CPU-GPU 的亲和性以及编译器与速度优化的稳健接口。我们将我们的方法应用于并行泊松求解器，并与消息传递接口（MPI）的方法、实现和扩展性能进行了比较，发现 CAF 提供了类似的速度，且更易于实现。对于新代码而言，这种方法为优化并行计算提供了更快的途径。对于传统代码来说，它简化了向并行计算的过渡，使其能够转变为可扩展的高性能计算应用，而无需大量的重新设计或额外的语法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Accelerating Fortran codes: A method for integrating Coarray Fortran with CUDA Fortran and OpenMP

查看原文本刊更多论文

Accelerating Fortran codes: A method for integrating Coarray Fortran with CUDA Fortran and OpenMP

Fortran's prominence in scientific computing requires strategies to ensure both that legacy codes are efficient on high-performance computing systems, and that the language remains attractive for the development of new high-performance codes. Coarray Fortran (CAF), part of the Fortran 2008 standard introduced for parallel programming, facilitates distributed memory parallelism with a syntax familiar to Fortran programmers, simplifying the transition from single-processor to multi-processor coding. This research focuses on innovating and refining a parallel programming methodology that fuses the strengths of Intel Coarray Fortran, Nvidia CUDA Fortran, and OpenMP for distributed memory parallelism, high-speed GPU acceleration and shared memory parallelism respectively. We consider the management of pageable and pinned memory, CPU-GPU affinity in NUMA multiprocessors, and robust compiler interfacing with speed optimisation. We demonstrate our method through its application to a parallelised Poisson solver and compare the methodology, implementation, and scaling performance to that of the Message Passing Interface (MPI), finding CAF offers similar speeds with easier implementation. For new codes, this approach offers a faster route to optimised parallel computing. For legacy codes, it eases the transition to parallel computing, allowing their transformation into scalable, high-performance computing applications without the need for extensive re-design or additional syntax.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Parallel and Distributed Computing 工程技术-计算机：理论方法

CiteScore

10.30

自引率

2.60%

发文量

172

审稿时长

12 months

期刊介绍： This international journal is directed to researchers, engineers, educators, managers, programmers, and users of computers who have particular interests in parallel processing and/or distributed computing. The Journal of Parallel and Distributed Computing publishes original research papers and timely review articles on the theory, design, evaluation, and use of parallel and/or distributed computing systems. The journal also features special issues on these topics; again covering the full range from the design to the use of our targeted systems.