J. Hammond, Tom Deakin, J. Cownie, Simon McIntosh-Smith
{"title":"Benchmarking Fortran DO CONCURRENT on CPUs and GPUs Using BabelStream","authors":"J. Hammond, Tom Deakin, J. Cownie, Simon McIntosh-Smith","doi":"10.1109/PMBS56514.2022.00013","DOIUrl":null,"url":null,"abstract":"Fortran DO CONCURRENT has emerged as a new way to achieve parallel execution of loops on CPUs and GPUs. This paper studies the performance portability of this construct on a range of processors and compares it with the incumbent models: OpenMP, OpenACC and CUDA. To do this study fairly, we implemented the BabelStream memory bandwidth benchmark from scratch, entirely in modern Fortran, for all of the models considered, which include Fortran DO CONCURRENT, as well as two variants of OpenACC, four variants of OpenMP (2 CPU and 2 GPU), CUDA Fortran, and both loop- and array-based references. BabelStream Fortran matches the C++ implementation as closely as possible, and can be used to make language-based comparisons. This paper represents one of the first detailed studies of the performance of Fortran support on heterogeneous architectures; we include results for AArch64 and x86_64 CPUs as well as AMD, Intel and NVIDIA GPU platforms.","PeriodicalId":321991,"journal":{"name":"2022 IEEE/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PMBS56514.2022.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Fortran DO CONCURRENT has emerged as a new way to achieve parallel execution of loops on CPUs and GPUs. This paper studies the performance portability of this construct on a range of processors and compares it with the incumbent models: OpenMP, OpenACC and CUDA. To do this study fairly, we implemented the BabelStream memory bandwidth benchmark from scratch, entirely in modern Fortran, for all of the models considered, which include Fortran DO CONCURRENT, as well as two variants of OpenACC, four variants of OpenMP (2 CPU and 2 GPU), CUDA Fortran, and both loop- and array-based references. BabelStream Fortran matches the C++ implementation as closely as possible, and can be used to make language-based comparisons. This paper represents one of the first detailed studies of the performance of Fortran support on heterogeneous architectures; we include results for AArch64 and x86_64 CPUs as well as AMD, Intel and NVIDIA GPU platforms.
Fortran DO CONCURRENT是在cpu和gpu上实现循环并行执行的一种新方法。本文研究了该结构在一系列处理器上的性能可移植性,并将其与现有模型:OpenMP, OpenACC和CUDA进行了比较。为了公平地进行这项研究,我们从头开始实现了BabelStream内存带宽基准测试,完全在现代Fortran中,对于所有考虑的模型,包括Fortran do CONCURRENT,以及OpenACC的两个变体,OpenMP的四个变体(2 CPU和2 GPU), CUDA Fortran,以及基于循环和数组的引用。BabelStream Fortran尽可能地与c++实现匹配,并可用于进行基于语言的比较。本文是对异构体系结构上Fortran支持性能的首次详细研究之一;我们包括AArch64和x86_64 cpu以及AMD, Intel和NVIDIA GPU平台的结果。