Multi-chain prefetching: effective exploitation of inter-chain memory parallelism for pointer-chasing codes

Nicholas Kohout, Seungryul Choi, Dongkeun Kim, D. Yeung
{"title":"Multi-chain prefetching: effective exploitation of inter-chain memory parallelism for pointer-chasing codes","authors":"Nicholas Kohout, Seungryul Choi, Dongkeun Kim, D. Yeung","doi":"10.1109/PACT.2001.953307","DOIUrl":null,"url":null,"abstract":"Presents multi-chain prefetching, a technique that utilizes offline analysis and a hardware prefetch engine to prefetch multiple independent pointer chains simultaneously, thus exploiting inter-chain memory parallelism for the purpose of memory latency tolerance. This paper makes three contributions. First, we introduce a scheduling algorithm that identifies independent pointer chains in pointer-chasing codes and computes a prefetch schedule that overlaps serialized cache misses across separate chains. Our analysis focuses an static traversals. We also propose using speculation to identify independent pointer chains in dynamic traversals. Second, we present the design of a prefetch engine that traverses pointer-based data structures and overlaps multiple pointer chains according to our scheduling algorithm. Finally, we conduct an experimental evaluation of multi-chain prefetching and compare its performance against two existing techniques: jump pointer prefetching and prefetch arrays. Our results show that multi-chain prefetching improves the execution time by 40% across six pointer-chasing kernels from the Olden benchmark suite and by 8% across four SPECInt CPU2000 benchmarks. Multi-chain prefetching also outperforms jump pointer prefetching and prefetch arrays by 28% on Olden, and by 12% on SPECInt. Furthermore, speculation can enable multi-chain prefetching for some dynamic traversal codes, but our technique loses its effectiveness when the pointer-chain traversal order is unpredictable. Finally, we also show that combining multi-chain prefetching with prefetch arrays can potentially provide higher performance than either technique alone.","PeriodicalId":276650,"journal":{"name":"Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques","volume":"129 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2001.953307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 32

Abstract

Presents multi-chain prefetching, a technique that utilizes offline analysis and a hardware prefetch engine to prefetch multiple independent pointer chains simultaneously, thus exploiting inter-chain memory parallelism for the purpose of memory latency tolerance. This paper makes three contributions. First, we introduce a scheduling algorithm that identifies independent pointer chains in pointer-chasing codes and computes a prefetch schedule that overlaps serialized cache misses across separate chains. Our analysis focuses an static traversals. We also propose using speculation to identify independent pointer chains in dynamic traversals. Second, we present the design of a prefetch engine that traverses pointer-based data structures and overlaps multiple pointer chains according to our scheduling algorithm. Finally, we conduct an experimental evaluation of multi-chain prefetching and compare its performance against two existing techniques: jump pointer prefetching and prefetch arrays. Our results show that multi-chain prefetching improves the execution time by 40% across six pointer-chasing kernels from the Olden benchmark suite and by 8% across four SPECInt CPU2000 benchmarks. Multi-chain prefetching also outperforms jump pointer prefetching and prefetch arrays by 28% on Olden, and by 12% on SPECInt. Furthermore, speculation can enable multi-chain prefetching for some dynamic traversal codes, but our technique loses its effectiveness when the pointer-chain traversal order is unpredictable. Finally, we also show that combining multi-chain prefetching with prefetch arrays can potentially provide higher performance than either technique alone.
多链预取:有效地利用了指针跟踪代码的链间内存并行性
介绍了多链预取技术,该技术利用离线分析和硬件预取引擎同时预取多个独立的指针链,从而利用链间内存并行性来实现内存延迟容忍。本文有三个贡献。首先,我们介绍了一种调度算法,该算法识别指针跟踪代码中的独立指针链,并计算一个预取计划,该计划在不同的链上重叠序列化的缓存缺失。我们的分析侧重于静态遍历。我们还建议使用推测来识别动态遍历中的独立指针链。其次,我们设计了一个预取引擎,根据我们的调度算法遍历基于指针的数据结构并重叠多个指针链。最后,我们对多链预取进行了实验评估,并将其与两种现有技术(跳转指针预取和数组预取)的性能进行了比较。我们的结果表明,多链预取在来自Olden基准测试套件的六个指针跟踪内核中将执行时间提高了40%,在四个SPECInt CPU2000基准测试中提高了8%。多链预取在Olden上比跳转指针预取和预取数组高出28%,在SPECInt上高出12%。此外,推测可以为一些动态遍历代码启用多链预取,但是当指针链遍历顺序不可预测时,我们的技术就失去了有效性。最后,我们还表明,将多链预取与预取数组相结合可能比单独使用任何一种技术提供更高的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信