在数据并行程序上有效地生成通信集

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing Pub Date : 1997-12-10 DOI:10.1109/ICAPP.1997.651505

Tsung-Chuan Huang, L. Shiu, Cherng-Haw Yu

{"title":"在数据并行程序上有效地生成通信集","authors":"Tsung-Chuan Huang, L. Shiu, Cherng-Haw Yu","doi":"10.1109/ICAPP.1997.651505","DOIUrl":null,"url":null,"abstract":"Generating local memory access sequences and communication sets efficiently is an important issue while compiling a data-parallel language into a SPMD (Single Program Multiple Data) code. Recently, several approaches have been presented; they are based on the case in which array references are distributed across arbitrary number of processors with arbitrary block sizes using block-cyclic distribution. Typically, in order to generate explicit communication sets, each node program has to scan over the local memory access sequences. In this paper, we focus on two cases. First, array references are aligned to a common template and this template is distributed across processors using block-cyclic distribution. Second, array references are distributed across the same number of processors with same block size. The first case is further classified into one-level and two-level mappings. We construct a block state graph to generate communication sets by scanning only a portion of local memory access sequence. In one-level mappings and the second case, we only need to scan the active elements among the first s local active blocks; while in two-level mappings, only need to scan the active elements among the first /spl alpha/*s local active blocks, where s is the stride of regular section and a is the stride of alignment function. As a result, the efficiency can be greatly improved.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Generating communication sets efficiently on data-parallel programs\",\"authors\":\"Tsung-Chuan Huang, L. Shiu, Cherng-Haw Yu\",\"doi\":\"10.1109/ICAPP.1997.651505\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Generating local memory access sequences and communication sets efficiently is an important issue while compiling a data-parallel language into a SPMD (Single Program Multiple Data) code. Recently, several approaches have been presented; they are based on the case in which array references are distributed across arbitrary number of processors with arbitrary block sizes using block-cyclic distribution. Typically, in order to generate explicit communication sets, each node program has to scan over the local memory access sequences. In this paper, we focus on two cases. First, array references are aligned to a common template and this template is distributed across processors using block-cyclic distribution. Second, array references are distributed across the same number of processors with same block size. The first case is further classified into one-level and two-level mappings. We construct a block state graph to generate communication sets by scanning only a portion of local memory access sequence. In one-level mappings and the second case, we only need to scan the active elements among the first s local active blocks; while in two-level mappings, only need to scan the active elements among the first /spl alpha/*s local active blocks, where s is the stride of regular section and a is the stride of alignment function. As a result, the efficiency can be greatly improved.\",\"PeriodicalId\":325978,\"journal\":{\"name\":\"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAPP.1997.651505\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAPP.1997.651505","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在将数据并行语言编译成单程序多数据(SPMD)代码时，有效地生成本地存储器访问序列和通信集是一个重要问题。最近，提出了几种方法;它们基于使用块循环分布的情况，其中数组引用分布在具有任意块大小的任意数量的处理器上。通常，为了生成显式通信集，每个节点程序都必须扫描本地内存访问序列。在本文中，我们主要关注两种情况。首先，数组引用对齐到一个公共模板，这个模板使用块循环分布分布在处理器之间。其次，数组引用分布在具有相同块大小的相同数量的处理器上。第一种情况进一步分为一级映射和二级映射。我们构造了一个块状态图，通过扫描局部存储器访问序列的一部分来生成通信集。在一级映射和二级映射中，我们只需要扫描前5个本地活动块中的活动元素;而在两级映射中，只需要扫描第一个/spl alpha/*s个局部活动块中的活动元素，其中s为规则段的步长，a为对齐函数的步长。因此，效率可以大大提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Generating communication sets efficiently on data-parallel programs

Generating local memory access sequences and communication sets efficiently is an important issue while compiling a data-parallel language into a SPMD (Single Program Multiple Data) code. Recently, several approaches have been presented; they are based on the case in which array references are distributed across arbitrary number of processors with arbitrary block sizes using block-cyclic distribution. Typically, in order to generate explicit communication sets, each node program has to scan over the local memory access sequences. In this paper, we focus on two cases. First, array references are aligned to a common template and this template is distributed across processors using block-cyclic distribution. Second, array references are distributed across the same number of processors with same block size. The first case is further classified into one-level and two-level mappings. We construct a block state graph to generate communication sets by scanning only a portion of local memory access sequence. In one-level mappings and the second case, we only need to scan the active elements among the first s local active blocks; while in two-level mappings, only need to scan the active elements among the first /spl alpha/*s local active blocks, where s is the stride of regular section and a is the stride of alignment function. As a result, the efficiency can be greatly improved.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

自引率

0.00%

发文量