Sandbox Prefetching: Safe run-time evaluation of aggressive prefetchers

2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2014-06-19 DOI:10.1109/HPCA.2014.6835971

Seth H. Pugsley, Zeshan A. Chishti, C. Wilkerson, Peng-fei Chuang, Robert L. Scott, A. Jaleel, Shih-Lien Lu, K. Chow, R. Balasubramonian

{"title":"Sandbox Prefetching: Safe run-time evaluation of aggressive prefetchers","authors":"Seth H. Pugsley, Zeshan A. Chishti, C. Wilkerson, Peng-fei Chuang, Robert L. Scott, A. Jaleel, Shih-Lien Lu, K. Chow, R. Balasubramonian","doi":"10.1109/HPCA.2014.6835971","DOIUrl":null,"url":null,"abstract":"Memory latency is a major factor in limiting CPU performance, and prefetching is a well-known method for hiding memory latency. Overly aggressive prefetching can waste scarce resources such as memory bandwidth and cache capacity, limiting or even hurting performance. It is therefore important to employ prefetching mechanisms that use these resources prudently, while still prefetching required data in a timely manner. In this work, we propose a new mechanism to determine at run-time the appropriate prefetching mechanism for the currently executing program, called Sandbox Prefetching. Sandbox Prefetching evaluates simple, aggressive offset prefetchers at run-time by adding the prefetch address to a Bloom filter, rather than actually fetching the data into the cache. Subsequent cache accesses are tested against the contents of the Bloom filter to see if the aggressive prefetcher under evaluation could have accurately prefetched the data, while simultaneously testing for the existence of prefetchable streams. Real prefetches are performed when the accuracy of evaluated prefetchers exceeds a threshold. This method combines the ideas of global pattern confirmation and immediate prefetching action to achieve high performance. Sandbox Prefetching improves performance across the tested workloads by 47.6% compared to not using any prefetching, and by 18.7% compared to the Feedback Directed Prefetching technique. Performance is also improved by 1.4% compared to the Access Map Pattern Matching Prefetcher, while incurring considerably less logic and storage overheads.","PeriodicalId":164587,"journal":{"name":"2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"106","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2014.6835971","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 106

Abstract

Memory latency is a major factor in limiting CPU performance, and prefetching is a well-known method for hiding memory latency. Overly aggressive prefetching can waste scarce resources such as memory bandwidth and cache capacity, limiting or even hurting performance. It is therefore important to employ prefetching mechanisms that use these resources prudently, while still prefetching required data in a timely manner. In this work, we propose a new mechanism to determine at run-time the appropriate prefetching mechanism for the currently executing program, called Sandbox Prefetching. Sandbox Prefetching evaluates simple, aggressive offset prefetchers at run-time by adding the prefetch address to a Bloom filter, rather than actually fetching the data into the cache. Subsequent cache accesses are tested against the contents of the Bloom filter to see if the aggressive prefetcher under evaluation could have accurately prefetched the data, while simultaneously testing for the existence of prefetchable streams. Real prefetches are performed when the accuracy of evaluated prefetchers exceeds a threshold. This method combines the ideas of global pattern confirmation and immediate prefetching action to achieve high performance. Sandbox Prefetching improves performance across the tested workloads by 47.6% compared to not using any prefetching, and by 18.7% compared to the Feedback Directed Prefetching technique. Performance is also improved by 1.4% compared to the Access Map Pattern Matching Prefetcher, while incurring considerably less logic and storage overheads.

查看原文本刊更多论文

沙盒预取:对主动预取器进行安全的运行时评估

内存延迟是限制CPU性能的一个主要因素，预取是一种众所周知的隐藏内存延迟的方法。过于激进的预取会浪费内存带宽和缓存容量等稀缺资源，限制甚至损害性能。因此，重要的是采用审慎使用这些资源的预取机制，同时仍能及时预取所需的数据。在这项工作中，我们提出了一种新的机制，在运行时为当前执行的程序确定合适的预取机制，称为沙盒预取。沙盒预取在运行时通过将预取地址添加到Bloom过滤器来评估简单，积极的偏移预取器，而不是实际将数据获取到缓存中。随后的缓存访问将针对Bloom过滤器的内容进行测试，以查看正在评估的主动预取器是否可以准确地预取数据，同时测试是否存在可预取流。当预取器的评估精度超过阈值时，执行真正的预取。该方法结合了全局模式确认和即时预取的思想，实现了高性能。与不使用任何预取相比，沙箱预取在测试工作负载上的性能提高了47.6%，与反馈定向预取技术相比提高了18.7%。与访问映射模式匹配预取器相比，性能也提高了1.4%，同时产生的逻辑和存储开销也大大减少。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量