Stream chaining: exploiting multiple levels of correlation in data prefetching

Pedro Díaz, Marcelo H. Cintra
{"title":"Stream chaining: exploiting multiple levels of correlation in data prefetching","authors":"Pedro Díaz, Marcelo H. Cintra","doi":"10.1145/1555754.1555767","DOIUrl":null,"url":null,"abstract":"Data prefetching has long been an important technique to amortize the effects of the memory wall, and is likely to remain so in the current era of multi-core systems. Most prefetchers operate by identifying patterns and correlations in the miss address stream. Separating streams according to the memory access instruction that generates the misses is an effective way of filtering out spurious addresses from predictable streams. On the other hand, by localizing streams based on the memory access instructions, such prefetchers both lose the complete time sequence information of misses and can only issue prefetches for a single memory access instruction at a time.\n This paper proposes a novel class of prefetchers based on the idea of linking various localized streams into predictable chains of missing memory access instructions such that the prefetcher can issue prefetches along multiple streams. In this way the prefetcher is not limited to prefetching deeply for a single missing memory access instruction but can instead adaptively prefetch for other memory access instructions closer in time.\n Experimental results show that the proposed prefetcher consistently achieves better performance than a state-of-the-art prefetcher -- 10% on average, being only outperformed in very few cases and then by only 2%, and outperforming that prefetcher by as much as 55% -- while consuming the same amount of memory bandwidth.","PeriodicalId":91388,"journal":{"name":"Proceedings. International Symposium on Computer Architecture","volume":"2009 1","pages":"81-92"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"45","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1555754.1555767","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 45

Abstract

Data prefetching has long been an important technique to amortize the effects of the memory wall, and is likely to remain so in the current era of multi-core systems. Most prefetchers operate by identifying patterns and correlations in the miss address stream. Separating streams according to the memory access instruction that generates the misses is an effective way of filtering out spurious addresses from predictable streams. On the other hand, by localizing streams based on the memory access instructions, such prefetchers both lose the complete time sequence information of misses and can only issue prefetches for a single memory access instruction at a time. This paper proposes a novel class of prefetchers based on the idea of linking various localized streams into predictable chains of missing memory access instructions such that the prefetcher can issue prefetches along multiple streams. In this way the prefetcher is not limited to prefetching deeply for a single missing memory access instruction but can instead adaptively prefetch for other memory access instructions closer in time. Experimental results show that the proposed prefetcher consistently achieves better performance than a state-of-the-art prefetcher -- 10% on average, being only outperformed in very few cases and then by only 2%, and outperforming that prefetcher by as much as 55% -- while consuming the same amount of memory bandwidth.
流链:在数据预取中利用多级相关性
长期以来,数据预取一直是分摊内存墙影响的重要技术,并且在当前的多核系统时代可能仍然如此。大多数预取器通过识别丢失地址流中的模式和相关性来操作。根据产生错误的内存访问指令分离流是一种从可预测流中过滤掉虚假地址的有效方法。另一方面,由于基于内存访问指令对流进行了本地化,这种预取器既丢失了丢失的完整时间序列信息,又一次只能对单个内存访问指令发出预取。本文提出了一类新的预取器,基于将各种本地化流链接到缺失内存访问指令的可预测链中的想法,以便预取器可以沿多个流发出预取。通过这种方式,预取器不局限于对单个丢失的内存访问指令进行深度预取,而是可以自适应地预取时间更近的其他内存访问指令。实验结果表明,所提出的预取器始终比最先进的预取器获得更好的性能——平均10%,仅在极少数情况下优于2%,并且在消耗相同数量的内存带宽的情况下优于该预取器多达55%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信