高速每地址两级分支预测器的设计优化

Proceedings International Conference on Computer Design VLSI in Computers and Processors Pub Date : 1997-10-12 DOI:10.1109/ICCD.1997.628854

I-Cheng K. Chen, Chih-Chieh Lee, M. Postiff, T. Mudge

{"title":"高速每地址两级分支预测器的设计优化","authors":"I-Cheng K. Chen, Chih-Chieh Lee, M. Postiff, T. Mudge","doi":"10.1109/ICCD.1997.628854","DOIUrl":null,"url":null,"abstract":"Per-address two-level branch predictors have been shown to be among the best predictors and have been implemented in current microprocessors. However, as the cycle time of modern microprocessors continues to decrease, the implementation of set-associative per-address two-level branch predictors will become more difficult. Instead, direct-mapped designs may be more attractive. In this paper, we investigate an alternative implementation of the per-address two-level predictor referred to as the tagless, direct-mapped predictor which is simpler and has faster access time. The tagless predictor can offer comparable performance to current set-associative designs since removal of tags allows more resources to be allocated for the predictor and branch target buffer (BTB). Removal of tags also decouples the per-address predictors from the BTB, thus allowing the two components to be optimized individually. Furthermore, our results show that this tagless implementation is more accurate because it handles conflict misses in the branch history table better. Finally, we examine the system cost-benefit for tagless per-address predictors across a wide design space using equal-cost contours. We study the sensitivity of performance to the workloads by comparing results from the Instruction Benchmark Suite (IBS) and SPEC CINT95. Our work provides principles and quantitative parameters for optimal configurations of such predictors.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Design optimization for high-speed per-address two-level branch predictors\",\"authors\":\"I-Cheng K. Chen, Chih-Chieh Lee, M. Postiff, T. Mudge\",\"doi\":\"10.1109/ICCD.1997.628854\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Per-address two-level branch predictors have been shown to be among the best predictors and have been implemented in current microprocessors. However, as the cycle time of modern microprocessors continues to decrease, the implementation of set-associative per-address two-level branch predictors will become more difficult. Instead, direct-mapped designs may be more attractive. In this paper, we investigate an alternative implementation of the per-address two-level predictor referred to as the tagless, direct-mapped predictor which is simpler and has faster access time. The tagless predictor can offer comparable performance to current set-associative designs since removal of tags allows more resources to be allocated for the predictor and branch target buffer (BTB). Removal of tags also decouples the per-address predictors from the BTB, thus allowing the two components to be optimized individually. Furthermore, our results show that this tagless implementation is more accurate because it handles conflict misses in the branch history table better. Finally, we examine the system cost-benefit for tagless per-address predictors across a wide design space using equal-cost contours. We study the sensitivity of performance to the workloads by comparing results from the Instruction Benchmark Suite (IBS) and SPEC CINT95. Our work provides principles and quantitative parameters for optimal configurations of such predictors.\",\"PeriodicalId\":154864,\"journal\":{\"name\":\"Proceedings International Conference on Computer Design VLSI in Computers and Processors\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings International Conference on Computer Design VLSI in Computers and Processors\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.1997.628854\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.1997.628854","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

每地址两级分支预测器已被证明是最好的预测器之一，并已在当前的微处理器中实现。然而，随着现代微处理器周期时间的不断缩短，集合关联的每地址两级分支预测器的实现将变得更加困难。相反，直接映射的设计可能更有吸引力。在本文中，我们研究了一种替代实现的每地址两级预测器，称为无标签，直接映射预测器，它更简单，具有更快的访问时间。无标签预测器可以提供与当前集合关联设计相当的性能，因为移除标签允许为预测器和分支目标缓冲区(BTB)分配更多资源。删除标记还将每个地址预测器与BTB解耦，从而允许单独优化这两个组件。此外，我们的结果表明，这种无标签实现更准确，因为它可以更好地处理分支历史表中的冲突遗漏。最后，我们使用等成本轮廓在广泛的设计空间中检查无标签每地址预测器的系统成本效益。我们通过比较指令基准测试套件(IBS)和SPEC CINT95的结果来研究性能对工作负载的敏感性。我们的工作为这种预测器的最佳配置提供了原则和定量参数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Design optimization for high-speed per-address two-level branch predictors

Per-address two-level branch predictors have been shown to be among the best predictors and have been implemented in current microprocessors. However, as the cycle time of modern microprocessors continues to decrease, the implementation of set-associative per-address two-level branch predictors will become more difficult. Instead, direct-mapped designs may be more attractive. In this paper, we investigate an alternative implementation of the per-address two-level predictor referred to as the tagless, direct-mapped predictor which is simpler and has faster access time. The tagless predictor can offer comparable performance to current set-associative designs since removal of tags allows more resources to be allocated for the predictor and branch target buffer (BTB). Removal of tags also decouples the per-address predictors from the BTB, thus allowing the two components to be optimized individually. Furthermore, our results show that this tagless implementation is more accurate because it handles conflict misses in the branch history table better. Finally, we examine the system cost-benefit for tagless per-address predictors across a wide design space using equal-cost contours. We study the sensitivity of performance to the workloads by comparing results from the Instruction Benchmark Suite (IBS) and SPEC CINT95. Our work provides principles and quantitative parameters for optimal configurations of such predictors.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings International Conference on Computer Design VLSI in Computers and Processors

自引率

0.00%

发文量