{"title":"在数据并行程序上有效地生成通信集","authors":"Tsung-Chuan Huang, L. Shiu, Cherng-Haw Yu","doi":"10.1109/ICAPP.1997.651505","DOIUrl":null,"url":null,"abstract":"Generating local memory access sequences and communication sets efficiently is an important issue while compiling a data-parallel language into a SPMD (Single Program Multiple Data) code. Recently, several approaches have been presented; they are based on the case in which array references are distributed across arbitrary number of processors with arbitrary block sizes using block-cyclic distribution. Typically, in order to generate explicit communication sets, each node program has to scan over the local memory access sequences. In this paper, we focus on two cases. First, array references are aligned to a common template and this template is distributed across processors using block-cyclic distribution. Second, array references are distributed across the same number of processors with same block size. The first case is further classified into one-level and two-level mappings. We construct a block state graph to generate communication sets by scanning only a portion of local memory access sequence. In one-level mappings and the second case, we only need to scan the active elements among the first s local active blocks; while in two-level mappings, only need to scan the active elements among the first /spl alpha/*s local active blocks, where s is the stride of regular section and a is the stride of alignment function. As a result, the efficiency can be greatly improved.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Generating communication sets efficiently on data-parallel programs\",\"authors\":\"Tsung-Chuan Huang, L. Shiu, Cherng-Haw Yu\",\"doi\":\"10.1109/ICAPP.1997.651505\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Generating local memory access sequences and communication sets efficiently is an important issue while compiling a data-parallel language into a SPMD (Single Program Multiple Data) code. Recently, several approaches have been presented; they are based on the case in which array references are distributed across arbitrary number of processors with arbitrary block sizes using block-cyclic distribution. Typically, in order to generate explicit communication sets, each node program has to scan over the local memory access sequences. In this paper, we focus on two cases. First, array references are aligned to a common template and this template is distributed across processors using block-cyclic distribution. Second, array references are distributed across the same number of processors with same block size. The first case is further classified into one-level and two-level mappings. We construct a block state graph to generate communication sets by scanning only a portion of local memory access sequence. In one-level mappings and the second case, we only need to scan the active elements among the first s local active blocks; while in two-level mappings, only need to scan the active elements among the first /spl alpha/*s local active blocks, where s is the stride of regular section and a is the stride of alignment function. As a result, the efficiency can be greatly improved.\",\"PeriodicalId\":325978,\"journal\":{\"name\":\"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAPP.1997.651505\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAPP.1997.651505","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Generating communication sets efficiently on data-parallel programs
Generating local memory access sequences and communication sets efficiently is an important issue while compiling a data-parallel language into a SPMD (Single Program Multiple Data) code. Recently, several approaches have been presented; they are based on the case in which array references are distributed across arbitrary number of processors with arbitrary block sizes using block-cyclic distribution. Typically, in order to generate explicit communication sets, each node program has to scan over the local memory access sequences. In this paper, we focus on two cases. First, array references are aligned to a common template and this template is distributed across processors using block-cyclic distribution. Second, array references are distributed across the same number of processors with same block size. The first case is further classified into one-level and two-level mappings. We construct a block state graph to generate communication sets by scanning only a portion of local memory access sequence. In one-level mappings and the second case, we only need to scan the active elements among the first s local active blocks; while in two-level mappings, only need to scan the active elements among the first /spl alpha/*s local active blocks, where s is the stride of regular section and a is the stride of alignment function. As a result, the efficiency can be greatly improved.