{"title":"Efficient algorithms for multi-dimensional block-cyclic redistribution of arrays","authors":"Y. Lim, Neungsoo Park, V. Prasanna","doi":"10.1109/ICPP.1997.622650","DOIUrl":null,"url":null,"abstract":"We present a uniform framework for a classical problem, redistribution of a multi-dimensional array. Using a generalized circulant matrix formalism, we derive efficient direct, indirect and hybrid contention-free communication schedules. Our indirect schedule reduces the number of communication steps significantly compared with the previous approaches. Our approach exploits the regularity of the block-cyclic redistribution to minimize the index computation overheads. For the case of 2-d redistribution, when the block size increases by factors of K/sub 1/ and K/sub 2/ along each dimension and the process topology remains fixed, our indirect schedule performs the redistribution in O(log(K/sub 1/K/sub 2/)) communication steps. For the case of fixed block size and the processor topology is transposed, our indirect schedule results in O(log(L/G)) communication steps. Implementations of our algorithms on the IBM SP-2 show superior performance over previous approaches.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.1997.622650","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
We present a uniform framework for a classical problem, redistribution of a multi-dimensional array. Using a generalized circulant matrix formalism, we derive efficient direct, indirect and hybrid contention-free communication schedules. Our indirect schedule reduces the number of communication steps significantly compared with the previous approaches. Our approach exploits the regularity of the block-cyclic redistribution to minimize the index computation overheads. For the case of 2-d redistribution, when the block size increases by factors of K/sub 1/ and K/sub 2/ along each dimension and the process topology remains fixed, our indirect schedule performs the redistribution in O(log(K/sub 1/K/sub 2/)) communication steps. For the case of fixed block size and the processor topology is transposed, our indirect schedule results in O(log(L/G)) communication steps. Implementations of our algorithms on the IBM SP-2 show superior performance over previous approaches.