基于非阻塞MPI集合的并行三维FFT缩放

2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems Pub Date : 2014-11-16 DOI:10.1109/ScalA.2014.9

Sukhyun Song, J. Hollingsworth

{"title":"基于非阻塞MPI集合的并行三维FFT缩放","authors":"Sukhyun Song, J. Hollingsworth","doi":"10.1109/ScalA.2014.9","DOIUrl":null,"url":null,"abstract":"This paper describes a new method for scalable high-performance parallel 3-D FFT. We use a 2-D decomposition of 3-D arrays to increase scaling to a large number of cores. In order to achieve high performance, we use non-blocking MPI all-to-all operations and exploit computation-communication overlap. We also auto-tune our 3-D FFT code efficiently in a large parameter space and cope with the complex trade-off in optimizing our code in various system environments. According to experimental results with up to 32,768 cores, our method computes parallel 3-D FFT faster than the FFTW library by up to 1.83×.","PeriodicalId":323689,"journal":{"name":"2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems","volume":"423 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Scaling Parallel 3-D FFT with Non-Blocking MPI Collectives\",\"authors\":\"Sukhyun Song, J. Hollingsworth\",\"doi\":\"10.1109/ScalA.2014.9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes a new method for scalable high-performance parallel 3-D FFT. We use a 2-D decomposition of 3-D arrays to increase scaling to a large number of cores. In order to achieve high performance, we use non-blocking MPI all-to-all operations and exploit computation-communication overlap. We also auto-tune our 3-D FFT code efficiently in a large parameter space and cope with the complex trade-off in optimizing our code in various system environments. According to experimental results with up to 32,768 cores, our method computes parallel 3-D FFT faster than the FFTW library by up to 1.83×.\",\"PeriodicalId\":323689,\"journal\":{\"name\":\"2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems\",\"volume\":\"423 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ScalA.2014.9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ScalA.2014.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

本文提出了一种可扩展的高性能并行三维FFT的新方法。我们使用三维数组的二维分解来增加对大量内核的缩放。为了实现高性能，我们使用非阻塞MPI全对全操作，并利用计算通信重叠。我们还在大参数空间中有效地自动调整我们的3-D FFT代码，并处理在各种系统环境中优化代码的复杂权衡。在多达32,768个核的实验结果中，我们的方法计算并行三维FFT的速度比FFTW库快1.83倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scaling Parallel 3-D FFT with Non-Blocking MPI Collectives

This paper describes a new method for scalable high-performance parallel 3-D FFT. We use a 2-D decomposition of 3-D arrays to increase scaling to a large number of cores. In order to achieve high performance, we use non-blocking MPI all-to-all operations and exploit computation-communication overlap. We also auto-tune our 3-D FFT code efficiently in a large parameter space and cope with the complex trade-off in optimizing our code in various system environments. According to experimental results with up to 32,768 cores, our method computes parallel 3-D FFT faster than the FFTW library by up to 1.83×.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems

自引率

0.00%

发文量