在数据并行程序上有效地生成通信集

Tsung-Chuan Huang, L. Shiu, Cherng-Haw Yu
{"title":"在数据并行程序上有效地生成通信集","authors":"Tsung-Chuan Huang, L. Shiu, Cherng-Haw Yu","doi":"10.1109/ICAPP.1997.651505","DOIUrl":null,"url":null,"abstract":"Generating local memory access sequences and communication sets efficiently is an important issue while compiling a data-parallel language into a SPMD (Single Program Multiple Data) code. Recently, several approaches have been presented; they are based on the case in which array references are distributed across arbitrary number of processors with arbitrary block sizes using block-cyclic distribution. Typically, in order to generate explicit communication sets, each node program has to scan over the local memory access sequences. In this paper, we focus on two cases. First, array references are aligned to a common template and this template is distributed across processors using block-cyclic distribution. Second, array references are distributed across the same number of processors with same block size. The first case is further classified into one-level and two-level mappings. We construct a block state graph to generate communication sets by scanning only a portion of local memory access sequence. In one-level mappings and the second case, we only need to scan the active elements among the first s local active blocks; while in two-level mappings, only need to scan the active elements among the first /spl alpha/*s local active blocks, where s is the stride of regular section and a is the stride of alignment function. As a result, the efficiency can be greatly improved.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Generating communication sets efficiently on data-parallel programs\",\"authors\":\"Tsung-Chuan Huang, L. Shiu, Cherng-Haw Yu\",\"doi\":\"10.1109/ICAPP.1997.651505\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Generating local memory access sequences and communication sets efficiently is an important issue while compiling a data-parallel language into a SPMD (Single Program Multiple Data) code. Recently, several approaches have been presented; they are based on the case in which array references are distributed across arbitrary number of processors with arbitrary block sizes using block-cyclic distribution. Typically, in order to generate explicit communication sets, each node program has to scan over the local memory access sequences. In this paper, we focus on two cases. First, array references are aligned to a common template and this template is distributed across processors using block-cyclic distribution. Second, array references are distributed across the same number of processors with same block size. The first case is further classified into one-level and two-level mappings. We construct a block state graph to generate communication sets by scanning only a portion of local memory access sequence. In one-level mappings and the second case, we only need to scan the active elements among the first s local active blocks; while in two-level mappings, only need to scan the active elements among the first /spl alpha/*s local active blocks, where s is the stride of regular section and a is the stride of alignment function. As a result, the efficiency can be greatly improved.\",\"PeriodicalId\":325978,\"journal\":{\"name\":\"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAPP.1997.651505\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAPP.1997.651505","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在将数据并行语言编译成单程序多数据(SPMD)代码时,有效地生成本地存储器访问序列和通信集是一个重要问题。最近,提出了几种方法;它们基于使用块循环分布的情况,其中数组引用分布在具有任意块大小的任意数量的处理器上。通常,为了生成显式通信集,每个节点程序都必须扫描本地内存访问序列。在本文中,我们主要关注两种情况。首先,数组引用对齐到一个公共模板,这个模板使用块循环分布分布在处理器之间。其次,数组引用分布在具有相同块大小的相同数量的处理器上。第一种情况进一步分为一级映射和二级映射。我们构造了一个块状态图,通过扫描局部存储器访问序列的一部分来生成通信集。在一级映射和二级映射中,我们只需要扫描前5个本地活动块中的活动元素;而在两级映射中,只需要扫描第一个/spl alpha/*s个局部活动块中的活动元素,其中s为规则段的步长,a为对齐函数的步长。因此,效率可以大大提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Generating communication sets efficiently on data-parallel programs
Generating local memory access sequences and communication sets efficiently is an important issue while compiling a data-parallel language into a SPMD (Single Program Multiple Data) code. Recently, several approaches have been presented; they are based on the case in which array references are distributed across arbitrary number of processors with arbitrary block sizes using block-cyclic distribution. Typically, in order to generate explicit communication sets, each node program has to scan over the local memory access sequences. In this paper, we focus on two cases. First, array references are aligned to a common template and this template is distributed across processors using block-cyclic distribution. Second, array references are distributed across the same number of processors with same block size. The first case is further classified into one-level and two-level mappings. We construct a block state graph to generate communication sets by scanning only a portion of local memory access sequence. In one-level mappings and the second case, we only need to scan the active elements among the first s local active blocks; while in two-level mappings, only need to scan the active elements among the first /spl alpha/*s local active blocks, where s is the stride of regular section and a is the stride of alignment function. As a result, the efficiency can be greatly improved.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信