Instruction Prefetching of Systems Codes with Layout Optimized for Reduced Cache Misses

23rd Annual International Symposium on Computer Architecture (ISCA'96) Pub Date : 1996-05-15 DOI:10.1145/232973.233001

Chun Xia, J. Torrellas

{"title":"Instruction Prefetching of Systems Codes with Layout Optimized for Reduced Cache Misses","authors":"Chun Xia, J. Torrellas","doi":"10.1145/232973.233001","DOIUrl":null,"url":null,"abstract":"High-performing on-chip instruction caches are crucial to keep fast processors busy. Unfortunately, while on-chip caches are usually successful at intercepting instruction fetches in loop-intensive engineering codes, they are less able to do so in large systems codes. To improve the performance of the latter codes, the compiler can be used to lay out the code in memory for reduced cache conflicts. Interestingly, such an operation leaves the code in a state that can be exploited by a new type of instruction prefetching: guarded sequential prefetching.The idea is that the compiler leaves hints in the code as to how the code was laid out. Then, at run time, the prefetching hardware detects these hints and uses them to prefetch more effectively. This scheme can be implemented very cheaply: one bit encoded in control transfer instructions and a prefetch module that requires minor extensions to existing next-line sequential prefetchers. Furthermore, the scheme can be turned off and on at run time with the toggling of a bit in the TLB. The scheme is evaluated with simulations using complete traces from a 4-processor machine. Overall, for 16-Kbyte primary instruction caches, guarded sequential prefetching removes, on average, 66% of the instruction misses remaining in an operating system with an optimized layout, speeding up the operating system by 10%. Moreover, the scheme is more cost-effective and robust than existing sequential prefetching techniques.","PeriodicalId":415354,"journal":{"name":"23rd Annual International Symposium on Computer Architecture (ISCA'96)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1996-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"23rd Annual International Symposium on Computer Architecture (ISCA'96)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/232973.233001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 37

Abstract

High-performing on-chip instruction caches are crucial to keep fast processors busy. Unfortunately, while on-chip caches are usually successful at intercepting instruction fetches in loop-intensive engineering codes, they are less able to do so in large systems codes. To improve the performance of the latter codes, the compiler can be used to lay out the code in memory for reduced cache conflicts. Interestingly, such an operation leaves the code in a state that can be exploited by a new type of instruction prefetching: guarded sequential prefetching.The idea is that the compiler leaves hints in the code as to how the code was laid out. Then, at run time, the prefetching hardware detects these hints and uses them to prefetch more effectively. This scheme can be implemented very cheaply: one bit encoded in control transfer instructions and a prefetch module that requires minor extensions to existing next-line sequential prefetchers. Furthermore, the scheme can be turned off and on at run time with the toggling of a bit in the TLB. The scheme is evaluated with simulations using complete traces from a 4-processor machine. Overall, for 16-Kbyte primary instruction caches, guarded sequential prefetching removes, on average, 66% of the instruction misses remaining in an operating system with an optimized layout, speeding up the operating system by 10%. Moreover, the scheme is more cost-effective and robust than existing sequential prefetching techniques.

查看原文本刊更多论文

为减少缓存缺失而优化布局的系统代码指令预取

高性能片上指令缓存对于保持快速处理器的繁忙状态至关重要。不幸的是，虽然片上缓存通常能够成功地拦截循环密集型工程代码中的指令获取，但它们在大型系统代码中却不太能够做到这一点。为了提高后一种代码的性能，可以使用编译器在内存中布局代码，以减少缓存冲突。有趣的是，这样的操作使代码处于一种可以被一种新的指令预取所利用的状态:受保护的顺序预取。其思想是编译器在代码中留下关于代码如何布局的提示。然后，在运行时，预取硬件检测到这些提示，并使用它们更有效地预取。该方案可以非常便宜地实现:在控制传输指令中编码一个比特，并使用一个预取模块，该模块需要对现有的下一行顺序预取器进行少量扩展。此外，该方案可以在运行时通过在TLB中切换一个位来关闭和打开。利用4处理器机器的完整迹线对该方案进行了仿真评估。总的来说，对于16kbyte的主指令缓存，有保护的顺序预取平均删除了操作系统中66%的指令遗漏，使操作系统的速度提高了10%。此外，该方案比现有的顺序预取技术更具成本效益和鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

23rd Annual International Symposium on Computer Architecture (ISCA'96)

自引率

0.00%

发文量