基于区域的软件分布式共享内存预取技术

2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing Pub Date : 2010-05-17 DOI:10.1109/CCGRID.2010.16

Jie Cai, P. Strazdins, Alistair P. Rendell

{"title":"基于区域的软件分布式共享内存预取技术","authors":"Jie Cai, P. Strazdins, Alistair P. Rendell","doi":"10.1109/CCGRID.2010.16","DOIUrl":null,"url":null,"abstract":"Although shared memory programming models show good programmability compared to message passing programming models, their implementation by page-based software distributed shared memory systems usually suffers from high memory consistency costs. The major part of these costs is inter-node data transfer for keeping virtual shared memory consistent. A good prefetch strategy can reduce this cost. We develop two prefetch techniques, TReP and HReP, which are based on the execution history of each parallel region. These techniques are evaluated using offline simulations with the NAS Parallel Benchmarks and the LINPACK benchmark. On average, TReP achieves an efficiency (ratio of pages prefetched that were subsequently accessed) of 96% and a coverage (ratio of access faults avoided by prefetches) of 65%. HReP achieves an efficiency of 91% but has a coverage of 79%. Treating the cost of an incorrectly prefetched page to be equivalent to that of a miss, these techniques have an effective page miss rate of 63% and 71% respectively. Additionally, these two techniques are compared with two well-known software distributed shared memory (sDSM) prefetch techniques, Adaptive++ and TODFCM. TReP effectively reduces page miss rate by 53% and 34% more, and HReP effectively reduces page miss rate by 62% and 43% more, compared to Adaptive++ and TODFCM respectively. As for Adaptive++, these techniques also permit bulk prefetching for pages predicted using temporal locality, amortizing network communication costs and permitting bandwidth improvement from multi-rail network interfaces.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Region-Based Prefetch Techniques for Software Distributed Shared Memory Systems\",\"authors\":\"Jie Cai, P. Strazdins, Alistair P. Rendell\",\"doi\":\"10.1109/CCGRID.2010.16\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although shared memory programming models show good programmability compared to message passing programming models, their implementation by page-based software distributed shared memory systems usually suffers from high memory consistency costs. The major part of these costs is inter-node data transfer for keeping virtual shared memory consistent. A good prefetch strategy can reduce this cost. We develop two prefetch techniques, TReP and HReP, which are based on the execution history of each parallel region. These techniques are evaluated using offline simulations with the NAS Parallel Benchmarks and the LINPACK benchmark. On average, TReP achieves an efficiency (ratio of pages prefetched that were subsequently accessed) of 96% and a coverage (ratio of access faults avoided by prefetches) of 65%. HReP achieves an efficiency of 91% but has a coverage of 79%. Treating the cost of an incorrectly prefetched page to be equivalent to that of a miss, these techniques have an effective page miss rate of 63% and 71% respectively. Additionally, these two techniques are compared with two well-known software distributed shared memory (sDSM) prefetch techniques, Adaptive++ and TODFCM. TReP effectively reduces page miss rate by 53% and 34% more, and HReP effectively reduces page miss rate by 62% and 43% more, compared to Adaptive++ and TODFCM respectively. As for Adaptive++, these techniques also permit bulk prefetching for pages predicted using temporal locality, amortizing network communication costs and permitting bandwidth improvement from multi-rail network interfaces.\",\"PeriodicalId\":444485,\"journal\":{\"name\":\"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGRID.2010.16\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2010.16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

尽管与消息传递编程模型相比，共享内存编程模型显示出良好的可编程性，但基于页面的软件分布式共享内存系统实现它们通常需要高昂的内存一致性成本。这些成本的主要部分是节点间数据传输，以保持虚拟共享内存的一致性。一个好的预取策略可以减少这个成本。基于每个并行区域的执行历史，我们开发了两种预取技术:TReP和HReP。使用NAS并行基准测试和LINPACK基准测试进行离线模拟，对这些技术进行评估。平均而言，TReP的效率(预取的页面随后被访问的比率)为96%，覆盖率(预取避免的访问错误比率)为65%。HReP的效率为91%，覆盖率为79%。如果将错误预取页面的成本等同于丢失页面的成本，这些技术的有效页面丢失率分别为63%和71%。此外，还将这两种技术与两种知名的软件分布式共享内存(sDSM)预取技术Adaptive++和TODFCM进行了比较。与自适应++和TODFCM相比，TReP有效降低页面缺失率分别提高53%和34%，HReP有效降低页面缺失率分别提高62%和43%。对于Adaptive++，这些技术还允许对使用时间局部性预测的页面进行批量预取，平摊网络通信成本，并允许从多轨道网络接口改进带宽。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Region-Based Prefetch Techniques for Software Distributed Shared Memory Systems

Although shared memory programming models show good programmability compared to message passing programming models, their implementation by page-based software distributed shared memory systems usually suffers from high memory consistency costs. The major part of these costs is inter-node data transfer for keeping virtual shared memory consistent. A good prefetch strategy can reduce this cost. We develop two prefetch techniques, TReP and HReP, which are based on the execution history of each parallel region. These techniques are evaluated using offline simulations with the NAS Parallel Benchmarks and the LINPACK benchmark. On average, TReP achieves an efficiency (ratio of pages prefetched that were subsequently accessed) of 96% and a coverage (ratio of access faults avoided by prefetches) of 65%. HReP achieves an efficiency of 91% but has a coverage of 79%. Treating the cost of an incorrectly prefetched page to be equivalent to that of a miss, these techniques have an effective page miss rate of 63% and 71% respectively. Additionally, these two techniques are compared with two well-known software distributed shared memory (sDSM) prefetch techniques, Adaptive++ and TODFCM. TReP effectively reduces page miss rate by 53% and 34% more, and HReP effectively reduces page miss rate by 62% and 43% more, compared to Adaptive++ and TODFCM respectively. As for Adaptive++, these techniques also permit bulk prefetching for pages predicted using temporal locality, amortizing network communication costs and permitting bandwidth improvement from multi-rail network interfaces.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing

自引率

0.00%

发文量