Stream chaining: exploiting multiple levels of correlation in data prefetching

Proceedings. International Symposium on Computer Architecture Pub Date : 2009-06-15 DOI:10.1145/1555754.1555767

Pedro Díaz, Marcelo H. Cintra

{"title":"Stream chaining: exploiting multiple levels of correlation in data prefetching","authors":"Pedro Díaz, Marcelo H. Cintra","doi":"10.1145/1555754.1555767","DOIUrl":null,"url":null,"abstract":"Data prefetching has long been an important technique to amortize the effects of the memory wall, and is likely to remain so in the current era of multi-core systems. Most prefetchers operate by identifying patterns and correlations in the miss address stream. Separating streams according to the memory access instruction that generates the misses is an effective way of filtering out spurious addresses from predictable streams. On the other hand, by localizing streams based on the memory access instructions, such prefetchers both lose the complete time sequence information of misses and can only issue prefetches for a single memory access instruction at a time.\n This paper proposes a novel class of prefetchers based on the idea of linking various localized streams into predictable chains of missing memory access instructions such that the prefetcher can issue prefetches along multiple streams. In this way the prefetcher is not limited to prefetching deeply for a single missing memory access instruction but can instead adaptively prefetch for other memory access instructions closer in time.\n Experimental results show that the proposed prefetcher consistently achieves better performance than a state-of-the-art prefetcher -- 10% on average, being only outperformed in very few cases and then by only 2%, and outperforming that prefetcher by as much as 55% -- while consuming the same amount of memory bandwidth.","PeriodicalId":91388,"journal":{"name":"Proceedings. International Symposium on Computer Architecture","volume":"2009 1","pages":"81-92"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"45","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1555754.1555767","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 45

Abstract

Data prefetching has long been an important technique to amortize the effects of the memory wall, and is likely to remain so in the current era of multi-core systems. Most prefetchers operate by identifying patterns and correlations in the miss address stream. Separating streams according to the memory access instruction that generates the misses is an effective way of filtering out spurious addresses from predictable streams. On the other hand, by localizing streams based on the memory access instructions, such prefetchers both lose the complete time sequence information of misses and can only issue prefetches for a single memory access instruction at a time. This paper proposes a novel class of prefetchers based on the idea of linking various localized streams into predictable chains of missing memory access instructions such that the prefetcher can issue prefetches along multiple streams. In this way the prefetcher is not limited to prefetching deeply for a single missing memory access instruction but can instead adaptively prefetch for other memory access instructions closer in time. Experimental results show that the proposed prefetcher consistently achieves better performance than a state-of-the-art prefetcher -- 10% on average, being only outperformed in very few cases and then by only 2%, and outperforming that prefetcher by as much as 55% -- while consuming the same amount of memory bandwidth.

查看原文本刊更多论文

流链:在数据预取中利用多级相关性

长期以来，数据预取一直是分摊内存墙影响的重要技术，并且在当前的多核系统时代可能仍然如此。大多数预取器通过识别丢失地址流中的模式和相关性来操作。根据产生错误的内存访问指令分离流是一种从可预测流中过滤掉虚假地址的有效方法。另一方面，由于基于内存访问指令对流进行了本地化，这种预取器既丢失了丢失的完整时间序列信息，又一次只能对单个内存访问指令发出预取。本文提出了一类新的预取器，基于将各种本地化流链接到缺失内存访问指令的可预测链中的想法，以便预取器可以沿多个流发出预取。通过这种方式，预取器不局限于对单个丢失的内存访问指令进行深度预取，而是可以自适应地预取时间更近的其他内存访问指令。实验结果表明，所提出的预取器始终比最先进的预取器获得更好的性能——平均10%，仅在极少数情况下优于2%，并且在消耗相同数量的内存带宽的情况下优于该预取器多达55%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. International Symposium on Computer Architecture

自引率

0.00%

发文量