反馈定向预取:提高硬件预取器的性能和带宽效率

2007 IEEE 13th International Symposium on High Performance Computer Architecture Pub Date : 2007-02-10 DOI:10.1109/HPCA.2007.346185

S. Srinath, O. Mutlu, Hyesoon Kim, Y. Patt

{"title":"反馈定向预取:提高硬件预取器的性能和带宽效率","authors":"S. Srinath, O. Mutlu, Hyesoon Kim, Y. Patt","doi":"10.1109/HPCA.2007.346185","DOIUrl":null,"url":null,"abstract":"High performance processors employ hardware data prefetching to reduce the negative performance impact of large main memory latencies. While prefetching improves performance substantially on many programs, it can significantly reduce performance on others. Also, prefetching can significantly increase memory bandwidth requirements. This paper proposes a mechanism that incorporates dynamic feedback into the design of the prefetcher to increase the performance improvement provided by prefetching as well as to reduce the negative performance and bandwidth impact of prefetching. Our mechanism estimates prefetcher accuracy, prefetcher timeliness, and prefetcher-caused cache pollution to adjust the aggressiveness of the data prefetcher dynamically. We introduce a new method to track cache pollution caused by the prefetcher at run-time. We also introduce a mechanism that dynamically decides where in the LRU stack to insert the prefetched blocks in the cache based on the cache pollution caused by the prefetcher. Using the proposed dynamic mechanism improves average performance by 6.5% on 17 memory-intensive benchmarks in the SPEC CPU2000 suite compared to the best-performing conventional stream-based data prefetcher configuration, while it consumes 18.7% less memory bandwidth. Compared to a conventional stream-based data prefetcher configuration that consumes similar amount of memory bandwidth, feedback directed prefetching provides 13.6% higher performance. Our results show that feedback-directed prefetching eliminates the large negative performance impact incurred on some benchmarks due to prefetching, and it is applicable to stream-based prefetchers, global-history-buffer based delta correlation prefetchers, and PC-based stride prefetchers","PeriodicalId":177324,"journal":{"name":"2007 IEEE 13th International Symposium on High Performance Computer Architecture","volume":"41 46","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"330","resultStr":"{\"title\":\"Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers\",\"authors\":\"S. Srinath, O. Mutlu, Hyesoon Kim, Y. Patt\",\"doi\":\"10.1109/HPCA.2007.346185\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High performance processors employ hardware data prefetching to reduce the negative performance impact of large main memory latencies. While prefetching improves performance substantially on many programs, it can significantly reduce performance on others. Also, prefetching can significantly increase memory bandwidth requirements. This paper proposes a mechanism that incorporates dynamic feedback into the design of the prefetcher to increase the performance improvement provided by prefetching as well as to reduce the negative performance and bandwidth impact of prefetching. Our mechanism estimates prefetcher accuracy, prefetcher timeliness, and prefetcher-caused cache pollution to adjust the aggressiveness of the data prefetcher dynamically. We introduce a new method to track cache pollution caused by the prefetcher at run-time. We also introduce a mechanism that dynamically decides where in the LRU stack to insert the prefetched blocks in the cache based on the cache pollution caused by the prefetcher. Using the proposed dynamic mechanism improves average performance by 6.5% on 17 memory-intensive benchmarks in the SPEC CPU2000 suite compared to the best-performing conventional stream-based data prefetcher configuration, while it consumes 18.7% less memory bandwidth. Compared to a conventional stream-based data prefetcher configuration that consumes similar amount of memory bandwidth, feedback directed prefetching provides 13.6% higher performance. Our results show that feedback-directed prefetching eliminates the large negative performance impact incurred on some benchmarks due to prefetching, and it is applicable to stream-based prefetchers, global-history-buffer based delta correlation prefetchers, and PC-based stride prefetchers\",\"PeriodicalId\":177324,\"journal\":{\"name\":\"2007 IEEE 13th International Symposium on High Performance Computer Architecture\",\"volume\":\"41 46\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-02-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"330\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 IEEE 13th International Symposium on High Performance Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2007.346185\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE 13th International Symposium on High Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2007.346185","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 330

摘要

高性能处理器采用硬件数据预取来减少大型主存延迟对性能的负面影响。虽然预取在许多程序上大大提高了性能，但在其他程序上却会显著降低性能。此外，预取会显著增加内存带宽需求。本文提出了一种将动态反馈纳入预取器设计的机制，以增加预取提供的性能改进，并减少预取对性能和带宽的负面影响。我们的机制估计预取器的准确性、预取器的及时性和预取器引起的缓存污染，以动态地调整数据预取器的侵略性。提出了一种新的方法来跟踪预取器在运行时造成的缓存污染。我们还引入了一种机制，该机制根据预取器造成的缓存污染动态地决定在LRU堆栈中插入预取块的位置。在SPEC CPU2000套件的17个内存密集型基准测试中，与性能最佳的传统基于流的数据预取器配置相比，使用所提出的动态机制将平均性能提高6.5%，同时消耗的内存带宽减少18.7%。与传统的基于流的数据预取配置相比，反馈定向预取的性能提高了13.6%。我们的结果表明，反馈导向预取消除了由于预取而对某些基准产生的巨大负面性能影响，并且它适用于基于流的预取器、基于全局历史缓冲区的增量相关预取器和基于pc的跨步预取器

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers

High performance processors employ hardware data prefetching to reduce the negative performance impact of large main memory latencies. While prefetching improves performance substantially on many programs, it can significantly reduce performance on others. Also, prefetching can significantly increase memory bandwidth requirements. This paper proposes a mechanism that incorporates dynamic feedback into the design of the prefetcher to increase the performance improvement provided by prefetching as well as to reduce the negative performance and bandwidth impact of prefetching. Our mechanism estimates prefetcher accuracy, prefetcher timeliness, and prefetcher-caused cache pollution to adjust the aggressiveness of the data prefetcher dynamically. We introduce a new method to track cache pollution caused by the prefetcher at run-time. We also introduce a mechanism that dynamically decides where in the LRU stack to insert the prefetched blocks in the cache based on the cache pollution caused by the prefetcher. Using the proposed dynamic mechanism improves average performance by 6.5% on 17 memory-intensive benchmarks in the SPEC CPU2000 suite compared to the best-performing conventional stream-based data prefetcher configuration, while it consumes 18.7% less memory bandwidth. Compared to a conventional stream-based data prefetcher configuration that consumes similar amount of memory bandwidth, feedback directed prefetching provides 13.6% higher performance. Our results show that feedback-directed prefetching eliminates the large negative performance impact incurred on some benchmarks due to prefetching, and it is applicable to stream-based prefetchers, global-history-buffer based delta correlation prefetchers, and PC-based stride prefetchers

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2007 IEEE 13th International Symposium on High Performance Computer Architecture

自引率

0.00%

发文量