在可扩展的共享内存多处理器上对数据和计算进行自动分区

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI:10.1109/ICPP.1997.622557

S. Tandri, T. Abdelrahman

{"title":"在可扩展的共享内存多处理器上对数据和计算进行自动分区","authors":"S. Tandri, T. Abdelrahman","doi":"10.1109/ICPP.1997.622557","DOIUrl":null,"url":null,"abstract":"This paper describes an algorithm for deriving data and computation partitions on scalable shared memory multiprocessors. The algorithm establishes affinity relationships between where computations are performed and where data is located based on array accesses in the program. The algorithm then uses these affinity relationships to determine both static and dynamic partitions for arrays and parallel loops. Experimental results from a prototype implementation of the algorithm demonstrate that it is computationally efficient and that it improves the parallel performance of standard benchmarks. The results also show the necessity of taking shared memory effects (memory contention, cache locality, false-sharing and synchronization) into account-partitions derived to minimize only interprocessor communications do not necessarily result in the best performance.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":"{\"title\":\"Automatic partitioning of data and computations on scalable shared memory multiprocessors\",\"authors\":\"S. Tandri, T. Abdelrahman\",\"doi\":\"10.1109/ICPP.1997.622557\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes an algorithm for deriving data and computation partitions on scalable shared memory multiprocessors. The algorithm establishes affinity relationships between where computations are performed and where data is located based on array accesses in the program. The algorithm then uses these affinity relationships to determine both static and dynamic partitions for arrays and parallel loops. Experimental results from a prototype implementation of the algorithm demonstrate that it is computationally efficient and that it improves the parallel performance of standard benchmarks. The results also show the necessity of taking shared memory effects (memory contention, cache locality, false-sharing and synchronization) into account-partitions derived to minimize only interprocessor communications do not necessarily result in the best performance.\",\"PeriodicalId\":221761,\"journal\":{\"name\":\"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"33\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP.1997.622557\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.1997.622557","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 33

摘要

本文描述了一种在可扩展共享内存多处理器上导出数据和计算分区的算法。该算法根据程序中的数组访问，在执行计算的位置和数据的位置之间建立了亲和关系。然后，该算法使用这些亲和关系来确定数组和并行循环的静态和动态分区。该算法的原型实现实验结果表明，该算法计算效率高，并且提高了标准基准的并行性能。结果还显示了考虑共享内存效应(内存争用、缓存局部性、错误共享和同步)的必要性——仅为最小化处理器间通信而派生的分区不一定会产生最佳性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatic partitioning of data and computations on scalable shared memory multiprocessors

This paper describes an algorithm for deriving data and computation partitions on scalable shared memory multiprocessors. The algorithm establishes affinity relationships between where computations are performed and where data is located based on array accesses in the program. The algorithm then uses these affinity relationships to determine both static and dynamic partitions for arrays and parallel loops. Experimental results from a prototype implementation of the algorithm demonstrate that it is computationally efficient and that it improves the parallel performance of standard benchmarks. The results also show the necessity of taking shared memory effects (memory contention, cache locality, false-sharing and synchronization) into account-partitions derived to minimize only interprocessor communications do not necessarily result in the best performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)

自引率

0.00%

发文量