在可扩展的共享内存多处理器上对数据和计算进行自动分区

S. Tandri, T. Abdelrahman
{"title":"在可扩展的共享内存多处理器上对数据和计算进行自动分区","authors":"S. Tandri, T. Abdelrahman","doi":"10.1109/ICPP.1997.622557","DOIUrl":null,"url":null,"abstract":"This paper describes an algorithm for deriving data and computation partitions on scalable shared memory multiprocessors. The algorithm establishes affinity relationships between where computations are performed and where data is located based on array accesses in the program. The algorithm then uses these affinity relationships to determine both static and dynamic partitions for arrays and parallel loops. Experimental results from a prototype implementation of the algorithm demonstrate that it is computationally efficient and that it improves the parallel performance of standard benchmarks. The results also show the necessity of taking shared memory effects (memory contention, cache locality, false-sharing and synchronization) into account-partitions derived to minimize only interprocessor communications do not necessarily result in the best performance.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":"{\"title\":\"Automatic partitioning of data and computations on scalable shared memory multiprocessors\",\"authors\":\"S. Tandri, T. Abdelrahman\",\"doi\":\"10.1109/ICPP.1997.622557\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes an algorithm for deriving data and computation partitions on scalable shared memory multiprocessors. The algorithm establishes affinity relationships between where computations are performed and where data is located based on array accesses in the program. The algorithm then uses these affinity relationships to determine both static and dynamic partitions for arrays and parallel loops. Experimental results from a prototype implementation of the algorithm demonstrate that it is computationally efficient and that it improves the parallel performance of standard benchmarks. The results also show the necessity of taking shared memory effects (memory contention, cache locality, false-sharing and synchronization) into account-partitions derived to minimize only interprocessor communications do not necessarily result in the best performance.\",\"PeriodicalId\":221761,\"journal\":{\"name\":\"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"33\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP.1997.622557\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.1997.622557","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 33

摘要

本文描述了一种在可扩展共享内存多处理器上导出数据和计算分区的算法。该算法根据程序中的数组访问,在执行计算的位置和数据的位置之间建立了亲和关系。然后,该算法使用这些亲和关系来确定数组和并行循环的静态和动态分区。该算法的原型实现实验结果表明,该算法计算效率高,并且提高了标准基准的并行性能。结果还显示了考虑共享内存效应(内存争用、缓存局部性、错误共享和同步)的必要性——仅为最小化处理器间通信而派生的分区不一定会产生最佳性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Automatic partitioning of data and computations on scalable shared memory multiprocessors
This paper describes an algorithm for deriving data and computation partitions on scalable shared memory multiprocessors. The algorithm establishes affinity relationships between where computations are performed and where data is located based on array accesses in the program. The algorithm then uses these affinity relationships to determine both static and dynamic partitions for arrays and parallel loops. Experimental results from a prototype implementation of the algorithm demonstrate that it is computationally efficient and that it improves the parallel performance of standard benchmarks. The results also show the necessity of taking shared memory effects (memory contention, cache locality, false-sharing and synchronization) into account-partitions derived to minimize only interprocessor communications do not necessarily result in the best performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信