块递归算法在分布式内存机器上的高效通信实现

Proceedings of 1994 International Conference on Parallel and Distributed Systems Pub Date : 1994-12-19 DOI:10.1109/ICPADS.1994.590060

S. Gupta, Chua-Huang Huang, Rodney W. Johnson, P. Sadayappan

{"title":"块递归算法在分布式内存机器上的高效通信实现","authors":"S. Gupta, Chua-Huang Huang, Rodney W. Johnson, P. Sadayappan","doi":"10.1109/ICPADS.1994.590060","DOIUrl":null,"url":null,"abstract":"This paper presents a design methodology for developing efficient distributed-memory parallel programs for block-recursive algorithms such as the fast Fourier transform and bitonic sort. This design methodology is specifically suited for most modern supercomputers having a distributed-memory architecture with circuit-switched or wormhole routed mesh or hypercube interconnection network. A mathematical framework based on the tenser product and other matrix operations is used for representing algorithms. Communication-efficient implementations with effectively overlapped computation and communication are achieved by manipulating the mathematical representation using the tenser algebra. Performance results for FFT programs on the Intel iPSC/860 and Intel Paragon are presented.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Communication-efficient implementation of block recursive algorithms on distributed-memory machines\",\"authors\":\"S. Gupta, Chua-Huang Huang, Rodney W. Johnson, P. Sadayappan\",\"doi\":\"10.1109/ICPADS.1994.590060\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a design methodology for developing efficient distributed-memory parallel programs for block-recursive algorithms such as the fast Fourier transform and bitonic sort. This design methodology is specifically suited for most modern supercomputers having a distributed-memory architecture with circuit-switched or wormhole routed mesh or hypercube interconnection network. A mathematical framework based on the tenser product and other matrix operations is used for representing algorithms. Communication-efficient implementations with effectively overlapped computation and communication are achieved by manipulating the mathematical representation using the tenser algebra. Performance results for FFT programs on the Intel iPSC/860 and Intel Paragon are presented.\",\"PeriodicalId\":154429,\"journal\":{\"name\":\"Proceedings of 1994 International Conference on Parallel and Distributed Systems\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1994-12-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of 1994 International Conference on Parallel and Distributed Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPADS.1994.590060\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS.1994.590060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

本文提出了一种为块递归算法(如快速傅立叶变换和双次排序)开发高效分布式存储并行程序的设计方法。这种设计方法特别适合大多数具有分布式内存架构的现代超级计算机，这些架构具有电路交换或虫洞路由网格或超立方体互连网络。基于张量积和其他矩阵运算的数学框架用于表示算法。通过使用张量代数对数学表示进行操作，实现了计算和通信有效重叠的高效通信实现。给出了在Intel iPSC/860和Intel Paragon上运行FFT程序的性能结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Communication-efficient implementation of block recursive algorithms on distributed-memory machines

This paper presents a design methodology for developing efficient distributed-memory parallel programs for block-recursive algorithms such as the fast Fourier transform and bitonic sort. This design methodology is specifically suited for most modern supercomputers having a distributed-memory architecture with circuit-switched or wormhole routed mesh or hypercube interconnection network. A mathematical framework based on the tenser product and other matrix operations is used for representing algorithms. Communication-efficient implementations with effectively overlapped computation and communication are achieved by manipulating the mathematical representation using the tenser algebra. Performance results for FFT programs on the Intel iPSC/860 and Intel Paragon are presented.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of 1994 International Conference on Parallel and Distributed Systems

自引率

0.00%

发文量