加速并行三维FFT的多进程通信

2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI:10.1109/ExaMPI54564.2021.00011

Alan Ayala, S. Tomov, M. Stoyanov, A. Haidar, J. Dongarra

{"title":"加速并行三维FFT的多进程通信","authors":"Alan Ayala, S. Tomov, M. Stoyanov, A. Haidar, J. Dongarra","doi":"10.1109/ExaMPI54564.2021.00011","DOIUrl":null,"url":null,"abstract":"Today largest and most powerful supercomputers in the world are built on heterogeneous platforms; and using the combined power of multi-core CPUs and GPUs, has had a great impact accelerating large-scale applications. However, on these architectures, parallel algorithms, such as the Fast Fourier Transform (FFT), encounter that inter-processor communication become a bottleneck and limits their scalability. In this paper, we present techniques for speeding up multi-process communication cost during the computation of FFTs, considering hybrid network connections as those expected on upcoming exascale machines. Among our techniques, we present algorithmic tuning, making use of phase diagrams; parametric tuning, using different FFT settings; and MPI distribution tuning based on FFT size and computational resources available. We present several experiments obtained on Summit supercomputer at Oak Ridge National Laboratory, using up to 40,960 IBM Power9 cores and 6,144 NVIDIA V-100 GPUs.","PeriodicalId":222289,"journal":{"name":"2021 Workshop on Exascale MPI (ExaMPI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Accelerating Multi - Process Communication for Parallel 3-D FFT\",\"authors\":\"Alan Ayala, S. Tomov, M. Stoyanov, A. Haidar, J. Dongarra\",\"doi\":\"10.1109/ExaMPI54564.2021.00011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Today largest and most powerful supercomputers in the world are built on heterogeneous platforms; and using the combined power of multi-core CPUs and GPUs, has had a great impact accelerating large-scale applications. However, on these architectures, parallel algorithms, such as the Fast Fourier Transform (FFT), encounter that inter-processor communication become a bottleneck and limits their scalability. In this paper, we present techniques for speeding up multi-process communication cost during the computation of FFTs, considering hybrid network connections as those expected on upcoming exascale machines. Among our techniques, we present algorithmic tuning, making use of phase diagrams; parametric tuning, using different FFT settings; and MPI distribution tuning based on FFT size and computational resources available. We present several experiments obtained on Summit supercomputer at Oak Ridge National Laboratory, using up to 40,960 IBM Power9 cores and 6,144 NVIDIA V-100 GPUs.\",\"PeriodicalId\":222289,\"journal\":{\"name\":\"2021 Workshop on Exascale MPI (ExaMPI)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Workshop on Exascale MPI (ExaMPI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ExaMPI54564.2021.00011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Workshop on Exascale MPI (ExaMPI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ExaMPI54564.2021.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

今天，世界上最大、最强大的超级计算机都是建立在异构平台上的;以及利用多核cpu和gpu的联合能力，对加速大规模应用产生了巨大影响。然而，在这些架构上，并行算法，如快速傅里叶变换(FFT)，遇到处理器间通信成为瓶颈，限制了它们的可扩展性。在本文中，我们提出了在fft计算过程中加速多进程通信成本的技术，并将混合网络连接考虑为即将到来的百亿亿次机器所期望的。在我们的技术中，我们提出了算法调优，利用相位图;参数调整，使用不同的FFT设置;以及基于FFT大小和可用计算资源的MPI分布调优。我们介绍了在橡树岭国家实验室的Summit超级计算机上进行的几个实验，使用了多达40,960个IBM Power9内核和6,144个NVIDIA V-100 gpu。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Accelerating Multi - Process Communication for Parallel 3-D FFT

Today largest and most powerful supercomputers in the world are built on heterogeneous platforms; and using the combined power of multi-core CPUs and GPUs, has had a great impact accelerating large-scale applications. However, on these architectures, parallel algorithms, such as the Fast Fourier Transform (FFT), encounter that inter-processor communication become a bottleneck and limits their scalability. In this paper, we present techniques for speeding up multi-process communication cost during the computation of FFTs, considering hybrid network connections as those expected on upcoming exascale machines. Among our techniques, we present algorithmic tuning, making use of phase diagrams; parametric tuning, using different FFT settings; and MPI distribution tuning based on FFT size and computational resources available. We present several experiments obtained on Summit supercomputer at Oak Ridge National Laboratory, using up to 40,960 IBM Power9 cores and 6,144 NVIDIA V-100 GPUs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 Workshop on Exascale MPI (ExaMPI)

自引率

0.00%

发文量