Improving trace cache effectiveness with branch promotion and trace packing

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI:10.1109/ISCA.1998.694786

Sanjay J. Patel, M. Evers, Y. Patt

{"title":"Improving trace cache effectiveness with branch promotion and trace packing","authors":"Sanjay J. Patel, M. Evers, Y. Patt","doi":"10.1109/ISCA.1998.694786","DOIUrl":null,"url":null,"abstract":"The increasing widths of superscalar processors are placing greater demands upon the fetch mechanism. The trace cache meets these demands by placing logically contiguous instructions in physically contiguous storage. As a result, the trace cache delivers instructions at a high rate by supplying multiple fetch blocks each cycle. In this paper we examine two techniques to improve the number of instructions delivered each cycle by the trace cache. The first technique, branch promotion, dynamically converts strongly biased branches into branches with static predictions. Because these promoted branches require no dynamic prediction, the branch predictor suffers less from the negative effects of interference. Branch promotion unlocks the potential of the second technique: trace packing. With trace packing, trace segments are packed with as many instructions as will fit, without regard to naturally occurring fetch block boundaries. With both techniques, the effective fetch rate of the trace cache jumps up 17% over a trace cache which implements neither on a machine where the execution engine has a very aggressive memory disambiguator; the performance of a machine using branch promotion and trace packing is on average 11% higher than a machine using neither technique.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"57","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCA.1998.694786","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 57

Abstract

The increasing widths of superscalar processors are placing greater demands upon the fetch mechanism. The trace cache meets these demands by placing logically contiguous instructions in physically contiguous storage. As a result, the trace cache delivers instructions at a high rate by supplying multiple fetch blocks each cycle. In this paper we examine two techniques to improve the number of instructions delivered each cycle by the trace cache. The first technique, branch promotion, dynamically converts strongly biased branches into branches with static predictions. Because these promoted branches require no dynamic prediction, the branch predictor suffers less from the negative effects of interference. Branch promotion unlocks the potential of the second technique: trace packing. With trace packing, trace segments are packed with as many instructions as will fit, without regard to naturally occurring fetch block boundaries. With both techniques, the effective fetch rate of the trace cache jumps up 17% over a trace cache which implements neither on a machine where the execution engine has a very aggressive memory disambiguator; the performance of a machine using branch promotion and trace packing is on average 11% higher than a machine using neither technique.

查看原文本刊更多论文

通过分支提升和跟踪打包提高跟踪缓存的有效性

超标量处理器不断增加的宽度对获取机制提出了更高的要求。跟踪缓存通过将逻辑上连续的指令放在物理上连续的存储中来满足这些需求。因此，跟踪缓存通过每个周期提供多个获取块，以较高的速率传递指令。在本文中，我们研究了两种技术来提高跟踪缓存每个周期传递的指令数量。第一种技术，分支提升，动态地将强偏分支转换为具有静态预测的分支。由于这些提升的分支不需要动态预测，分支预测器受干扰的负面影响较小。分支推广开启了第二种技术的潜力:跟踪包装。通过跟踪打包，跟踪段被尽可能多的指令打包，而不考虑自然发生的获取块边界。使用这两种技术，跟踪缓存的有效读取率比在执行引擎具有非常积极的内存消歧器的机器上没有实现这两种技术的跟踪缓存提高17%;使用分支促销和微量包装的机器的性能平均比不使用这两种技术的机器高11%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)

自引率

0.00%

发文量