Scaling Parallel 3-D FFT with Non-Blocking MPI Collectives

2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems Pub Date : 2014-11-16 DOI:10.1109/ScalA.2014.9

Sukhyun Song, J. Hollingsworth

引用次数: 6

Abstract

This paper describes a new method for scalable high-performance parallel 3-D FFT. We use a 2-D decomposition of 3-D arrays to increase scaling to a large number of cores. In order to achieve high performance, we use non-blocking MPI all-to-all operations and exploit computation-communication overlap. We also auto-tune our 3-D FFT code efficiently in a large parameter space and cope with the complex trade-off in optimizing our code in various system environments. According to experimental results with up to 32,768 cores, our method computes parallel 3-D FFT faster than the FFTW library by up to 1.83×.

查看原文本刊更多论文

基于非阻塞MPI集合的并行三维FFT缩放

本文提出了一种可扩展的高性能并行三维FFT的新方法。我们使用三维数组的二维分解来增加对大量内核的缩放。为了实现高性能，我们使用非阻塞MPI全对全操作，并利用计算通信重叠。我们还在大参数空间中有效地自动调整我们的3-D FFT代码，并处理在各种系统环境中优化代码的复杂权衡。在多达32,768个核的实验结果中，我们的方法计算并行三维FFT的速度比FFTW库快1.83倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems

自引率

0.00%

发文量