用MPI实现高性能Coarray Fortran的源到源转换

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2018-01-28 DOI:10.1145/3149457.3155888

H. Iwashita, M. Nakao, H. Murai, M. Sato

{"title":"用MPI实现高性能Coarray Fortran的源到源转换","authors":"H. Iwashita, M. Nakao, H. Murai, M. Sato","doi":"10.1145/3149457.3155888","DOIUrl":null,"url":null,"abstract":"Coarray Fortran (CAF) is a partitioned global address space (PGAS) language that is a part of standard Fortran 2008. We have implemented it as a source-to-source translator as a part of the Omni XcalebleMP compiler. Since the output is written in Fortran standard, the translator must utilize Fortran conventions such as the assumed-shape array and generic function in order to reduce both development costs and runtime overhead. The runtime library uses either GASNet, MPI-3, or Fujitsu's low-level Remote Direct Memory Access (RDMA) interface (FJ-RDMA) for one-sided communication. The Omni CAF translator and the runtime library support three types of memory managers that allocate coarray variables and register them to the communication library. The runtime library for the PUT/GET communication detects how contiguous and periodic the source and destination data are and performs communication aggregation. We measured fundamental performance levels by using EPCC Fortran Coarray microbenchmark and found our implementation of PUT/GET communication provides bandwidth as high as MPI_Send/Recv on two supercomputers. Although the small data latency was larger than the one of MPI_Send/Recv, we found that it could be reduced by using non-blocking communication for multiple coarray variables. As a result, when using 1024 processes, we achieved 27% and 42% higher performance than the original MPI code in the Himeno Benchmark classes L and XL, respectively.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"A Source-to-Source Translation of Coarray Fortran with MPI for High Performance\",\"authors\":\"H. Iwashita, M. Nakao, H. Murai, M. Sato\",\"doi\":\"10.1145/3149457.3155888\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Coarray Fortran (CAF) is a partitioned global address space (PGAS) language that is a part of standard Fortran 2008. We have implemented it as a source-to-source translator as a part of the Omni XcalebleMP compiler. Since the output is written in Fortran standard, the translator must utilize Fortran conventions such as the assumed-shape array and generic function in order to reduce both development costs and runtime overhead. The runtime library uses either GASNet, MPI-3, or Fujitsu's low-level Remote Direct Memory Access (RDMA) interface (FJ-RDMA) for one-sided communication. The Omni CAF translator and the runtime library support three types of memory managers that allocate coarray variables and register them to the communication library. The runtime library for the PUT/GET communication detects how contiguous and periodic the source and destination data are and performs communication aggregation. We measured fundamental performance levels by using EPCC Fortran Coarray microbenchmark and found our implementation of PUT/GET communication provides bandwidth as high as MPI_Send/Recv on two supercomputers. Although the small data latency was larger than the one of MPI_Send/Recv, we found that it could be reduced by using non-blocking communication for multiple coarray variables. As a result, when using 1024 processes, we achieved 27% and 42% higher performance than the original MPI code in the Himeno Benchmark classes L and XL, respectively.\",\"PeriodicalId\":314778,\"journal\":{\"name\":\"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-01-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3149457.3155888\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3149457.3155888","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

Coarray Fortran (CAF)是一种分区全局地址空间(PGAS)语言，是标准Fortran 2008的一部分。我们已经将其作为Omni XcalebleMP编译器的一部分实现为一个源到源转换器。由于输出是用Fortran标准编写的，因此转换器必须利用Fortran约定，例如假设形状的数组和泛型函数，以减少开发成本和运行时开销。运行时库使用GASNet、MPI-3或富士通的低级远程直接内存访问(RDMA)接口(FJ-RDMA)进行单侧通信。Omni CAF转换器和运行时库支持三种类型的内存管理器，它们分配coarray变量并将它们注册到通信库。用于PUT/GET通信的运行时库检测源数据和目标数据的连续程度和周期性，并执行通信聚合。我们通过使用EPCC Fortran Coarray微基准测试来测量基本性能水平，发现我们实现的PUT/GET通信在两台超级计算机上提供了与MPI_Send/Recv一样高的带宽。虽然小数据延迟比MPI_Send/Recv大，但我们发现可以通过对多个coarray变量使用非阻塞通信来减少它。因此，当使用1024个进程时，我们在Himeno基准类L和XL中分别获得了比原始MPI代码高27%和42%的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Source-to-Source Translation of Coarray Fortran with MPI for High Performance

Coarray Fortran (CAF) is a partitioned global address space (PGAS) language that is a part of standard Fortran 2008. We have implemented it as a source-to-source translator as a part of the Omni XcalebleMP compiler. Since the output is written in Fortran standard, the translator must utilize Fortran conventions such as the assumed-shape array and generic function in order to reduce both development costs and runtime overhead. The runtime library uses either GASNet, MPI-3, or Fujitsu's low-level Remote Direct Memory Access (RDMA) interface (FJ-RDMA) for one-sided communication. The Omni CAF translator and the runtime library support three types of memory managers that allocate coarray variables and register them to the communication library. The runtime library for the PUT/GET communication detects how contiguous and periodic the source and destination data are and performs communication aggregation. We measured fundamental performance levels by using EPCC Fortran Coarray microbenchmark and found our implementation of PUT/GET communication provides bandwidth as high as MPI_Send/Recv on two supercomputers. Although the small data latency was larger than the one of MPI_Send/Recv, we found that it could be reduced by using non-blocking communication for multiple coarray variables. As a result, when using 1024 processes, we achieved 27% and 42% higher performance than the original MPI code in the Himeno Benchmark classes L and XL, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region

自引率

0.00%

发文量