{"title":"分组阵列布局减少通信,提高并行程序的局部性","authors":"Tien-Pao Shih, E. Davidson","doi":"10.1109/ICPADS.1994.590375","DOIUrl":null,"url":null,"abstract":"A data layout method, array grouping, is proposed to improve communication efficiency and cache utilization of parallel programs containing indirect array references or nonunit stride indexing. Conditions on where to apply this technique are specified in a series of theorems. The technique is then applied to a real finite element application. The experimental results show that communication is reduced by 15%, and data subcache misses by 40% on 56 processors of the KSR1 parallel computer.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"600 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Grouping array layouts to reduce communication and improve locality of parallel programs\",\"authors\":\"Tien-Pao Shih, E. Davidson\",\"doi\":\"10.1109/ICPADS.1994.590375\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A data layout method, array grouping, is proposed to improve communication efficiency and cache utilization of parallel programs containing indirect array references or nonunit stride indexing. Conditions on where to apply this technique are specified in a series of theorems. The technique is then applied to a real finite element application. The experimental results show that communication is reduced by 15%, and data subcache misses by 40% on 56 processors of the KSR1 parallel computer.\",\"PeriodicalId\":154429,\"journal\":{\"name\":\"Proceedings of 1994 International Conference on Parallel and Distributed Systems\",\"volume\":\"600 \",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1994-12-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of 1994 International Conference on Parallel and Distributed Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPADS.1994.590375\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS.1994.590375","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Grouping array layouts to reduce communication and improve locality of parallel programs
A data layout method, array grouping, is proposed to improve communication efficiency and cache utilization of parallel programs containing indirect array references or nonunit stride indexing. Conditions on where to apply this technique are specified in a series of theorems. The technique is then applied to a real finite element application. The experimental results show that communication is reduced by 15%, and data subcache misses by 40% on 56 processors of the KSR1 parallel computer.