{"title":"Data redistribution using MPI user-defined types","authors":"Chu-Sing Yang, Sheng-Wen Bai","doi":"10.1109/CW.2002.1180859","DOIUrl":null,"url":null,"abstract":"In many parallel programs, run-time data redistribution is usually required to enhance data locality and reduce remote memory access on the distributed memory multicomputers. Recently researches in data redistribution algorithm have become very mature. The time required to generate data sets and processor sets is much lesser then before. That means packing/unpacking becomes a relatively heavy cost in the redistribution. In this paper we present methods to perform BLOCK-CYCLIC(s) to BLOCK-CYCLIC(t) redistribution using MPI user-defined types. In this approach, we can reduce the requirement of memory buffers and avoid unnecessary data-movement. The theoretical models are presented to determine the best method for redistribution. To evaluate the performance of the proposed methods, we have implemented our methods on an IBM SP2 parallel machine. The experimental results show that this approach can obviously improve the performance of redistribution in most cases.","PeriodicalId":376322,"journal":{"name":"First International Symposium on Cyber Worlds, 2002. Proceedings.","volume":"116 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"First International Symposium on Cyber Worlds, 2002. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CW.2002.1180859","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In many parallel programs, run-time data redistribution is usually required to enhance data locality and reduce remote memory access on the distributed memory multicomputers. Recently researches in data redistribution algorithm have become very mature. The time required to generate data sets and processor sets is much lesser then before. That means packing/unpacking becomes a relatively heavy cost in the redistribution. In this paper we present methods to perform BLOCK-CYCLIC(s) to BLOCK-CYCLIC(t) redistribution using MPI user-defined types. In this approach, we can reduce the requirement of memory buffers and avoid unnecessary data-movement. The theoretical models are presented to determine the best method for redistribution. To evaluate the performance of the proposed methods, we have implemented our methods on an IBM SP2 parallel machine. The experimental results show that this approach can obviously improve the performance of redistribution in most cases.