有效的广义kr/spl rarr/r和r/spl rarr/kr阵列重分配的装箱/解装箱信息生成

Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation Pub Date : 1999-02-21 DOI:10.1109/FMPC.1999.750588

Ching-Hsien Hsu, Yeh-Ching Chung, C. Dow

{"title":"有效的广义kr/spl rarr/r和r/spl rarr/kr阵列重分配的装箱/解装箱信息生成","authors":"Ching-Hsien Hsu, Yeh-Ching Chung, C. Dow","doi":"10.1109/FMPC.1999.750588","DOIUrl":null,"url":null,"abstract":"Array redistribution is usually required to enhance algorithm performance in many parallel programs on distributed memory multicomputers. Since it is performed at run-time, there is a performance tradeoff between the efficiency of new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. In this paper, we present efficient methods to generate the packing/unpacking information for BOLCK-CYCLIC(kr) to BLOCK-CYCLIC(r) and BOLCK-CYCLIC(r) to BLOCK-CYCLIC(kr) redistribution with arbitrary source/destination processor sets. The most significant improvement of this paper is that a processor does not need to construct the send/receive data sets for a redistribution. Based on the packing/unpacking information derived from kr/spl rarr/r and r/spl rarr/kr redistributions, a processor can pack/unpack array elements into (from) messages directly. To evaluate the performance of our methods, we have implemented our methods along with the PITFALLS method and the Prylli's method on an IBM SP2 parallel machine. The experimental results show that our algorithms outperform the PITFALLS method and the Prylli's method for all test samples.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Packing/unpacking information generation for efficient generalized kr/spl rarr/r and r/spl rarr/kr array redistribution\",\"authors\":\"Ching-Hsien Hsu, Yeh-Ching Chung, C. Dow\",\"doi\":\"10.1109/FMPC.1999.750588\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Array redistribution is usually required to enhance algorithm performance in many parallel programs on distributed memory multicomputers. Since it is performed at run-time, there is a performance tradeoff between the efficiency of new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. In this paper, we present efficient methods to generate the packing/unpacking information for BOLCK-CYCLIC(kr) to BLOCK-CYCLIC(r) and BOLCK-CYCLIC(r) to BLOCK-CYCLIC(kr) redistribution with arbitrary source/destination processor sets. The most significant improvement of this paper is that a processor does not need to construct the send/receive data sets for a redistribution. Based on the packing/unpacking information derived from kr/spl rarr/r and r/spl rarr/kr redistributions, a processor can pack/unpack array elements into (from) messages directly. To evaluate the performance of our methods, we have implemented our methods along with the PITFALLS method and the Prylli's method on an IBM SP2 parallel machine. The experimental results show that our algorithms outperform the PITFALLS method and the Prylli's method for all test samples.\",\"PeriodicalId\":405655,\"journal\":{\"name\":\"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-02-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FMPC.1999.750588\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FMPC.1999.750588","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在分布式存储多计算机上的许多并行程序中，为了提高算法性能，通常需要对数组进行重分配。由于它是在运行时执行的，因此在算法的后续阶段分解新数据的效率和在处理器之间重新分配数据的成本之间存在性能权衡。本文提出了在任意源/目标处理器集下，生成BOLCK-CYCLIC(kr)到BLOCK-CYCLIC(r)和BOLCK-CYCLIC(r)到BLOCK-CYCLIC(kr)再分发的打包/解包信息的有效方法。本文最显著的改进是处理器不需要为再分发构建发送/接收数据集。基于从kr/spl rarr/r和r/spl rarr/kr重分发中获得的打包/解包信息，处理器可以直接将数组元素打包/解包到(from)消息中。为了评估我们的方法的性能，我们在IBM SP2并行机上实现了我们的方法以及陷阱方法和Prylli的方法。实验结果表明，我们的算法在所有测试样本上都优于陷阱方法和Prylli方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Packing/unpacking information generation for efficient generalized kr/spl rarr/r and r/spl rarr/kr array redistribution

Array redistribution is usually required to enhance algorithm performance in many parallel programs on distributed memory multicomputers. Since it is performed at run-time, there is a performance tradeoff between the efficiency of new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. In this paper, we present efficient methods to generate the packing/unpacking information for BOLCK-CYCLIC(kr) to BLOCK-CYCLIC(r) and BOLCK-CYCLIC(r) to BLOCK-CYCLIC(kr) redistribution with arbitrary source/destination processor sets. The most significant improvement of this paper is that a processor does not need to construct the send/receive data sets for a redistribution. Based on the packing/unpacking information derived from kr/spl rarr/r and r/spl rarr/kr redistributions, a processor can pack/unpack array elements into (from) messages directly. To evaluate the performance of our methods, we have implemented our methods along with the PITFALLS method and the Prylli's method on an IBM SP2 parallel machine. The experimental results show that our algorithms outperform the PITFALLS method and the Prylli's method for all test samples.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation

自引率

0.00%

发文量