Alberto Sanz, R. Asenjo, Juan López, R. Larrosa, A. Navarro, V. Litvinov, Sung-Eun Choi, B. Chamberlain
{"title":"在Chapel中通过通信聚合实现全局数据重新分配","authors":"Alberto Sanz, R. Asenjo, Juan López, R. Larrosa, A. Navarro, V. Litvinov, Sung-Eun Choi, B. Chamberlain","doi":"10.1109/SBAC-PAD.2012.18","DOIUrl":null,"url":null,"abstract":"Chapel is a parallel programming language designed to improve the productivity and ease of use of conventional and parallel computers. This language currently delivers sub optimal performance when executing codes that perform global data re-allocation operations on distributed memory architectures. This is mainly due to data communication that is done without aggregation (one message for each remote array element). In this work, we analyze Chapel's standard Block and Cyclic distribution modules and optimize the communication routines for array assignments by performing aggregation. Thanks to the expressive power of Chapel, the compiler and runtime have enough information to do communication aggregation without user intervention. The runtime relies on the low-level GAS Net networking layer, whose versions of one-sided bulk put/get routines that support strides are particularly useful for us. Experimental results conducted on Hector (a Cray XE6) and Jaguar (Cray XK6)reveal that the implemented techniques can lead to significant reductions in communication time.","PeriodicalId":232444,"journal":{"name":"2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"Global Data Re-allocation via Communication Aggregation in Chapel\",\"authors\":\"Alberto Sanz, R. Asenjo, Juan López, R. Larrosa, A. Navarro, V. Litvinov, Sung-Eun Choi, B. Chamberlain\",\"doi\":\"10.1109/SBAC-PAD.2012.18\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Chapel is a parallel programming language designed to improve the productivity and ease of use of conventional and parallel computers. This language currently delivers sub optimal performance when executing codes that perform global data re-allocation operations on distributed memory architectures. This is mainly due to data communication that is done without aggregation (one message for each remote array element). In this work, we analyze Chapel's standard Block and Cyclic distribution modules and optimize the communication routines for array assignments by performing aggregation. Thanks to the expressive power of Chapel, the compiler and runtime have enough information to do communication aggregation without user intervention. The runtime relies on the low-level GAS Net networking layer, whose versions of one-sided bulk put/get routines that support strides are particularly useful for us. Experimental results conducted on Hector (a Cray XE6) and Jaguar (Cray XK6)reveal that the implemented techniques can lead to significant reductions in communication time.\",\"PeriodicalId\":232444,\"journal\":{\"name\":\"2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SBAC-PAD.2012.18\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PAD.2012.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Global Data Re-allocation via Communication Aggregation in Chapel
Chapel is a parallel programming language designed to improve the productivity and ease of use of conventional and parallel computers. This language currently delivers sub optimal performance when executing codes that perform global data re-allocation operations on distributed memory architectures. This is mainly due to data communication that is done without aggregation (one message for each remote array element). In this work, we analyze Chapel's standard Block and Cyclic distribution modules and optimize the communication routines for array assignments by performing aggregation. Thanks to the expressive power of Chapel, the compiler and runtime have enough information to do communication aggregation without user intervention. The runtime relies on the low-level GAS Net networking layer, whose versions of one-sided bulk put/get routines that support strides are particularly useful for us. Experimental results conducted on Hector (a Cray XE6) and Jaguar (Cray XK6)reveal that the implemented techniques can lead to significant reductions in communication time.