I-Cheng K. Chen, Chih-Chieh Lee, M. Postiff, T. Mudge
{"title":"高速每地址两级分支预测器的设计优化","authors":"I-Cheng K. Chen, Chih-Chieh Lee, M. Postiff, T. Mudge","doi":"10.1109/ICCD.1997.628854","DOIUrl":null,"url":null,"abstract":"Per-address two-level branch predictors have been shown to be among the best predictors and have been implemented in current microprocessors. However, as the cycle time of modern microprocessors continues to decrease, the implementation of set-associative per-address two-level branch predictors will become more difficult. Instead, direct-mapped designs may be more attractive. In this paper, we investigate an alternative implementation of the per-address two-level predictor referred to as the tagless, direct-mapped predictor which is simpler and has faster access time. The tagless predictor can offer comparable performance to current set-associative designs since removal of tags allows more resources to be allocated for the predictor and branch target buffer (BTB). Removal of tags also decouples the per-address predictors from the BTB, thus allowing the two components to be optimized individually. Furthermore, our results show that this tagless implementation is more accurate because it handles conflict misses in the branch history table better. Finally, we examine the system cost-benefit for tagless per-address predictors across a wide design space using equal-cost contours. We study the sensitivity of performance to the workloads by comparing results from the Instruction Benchmark Suite (IBS) and SPEC CINT95. Our work provides principles and quantitative parameters for optimal configurations of such predictors.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Design optimization for high-speed per-address two-level branch predictors\",\"authors\":\"I-Cheng K. Chen, Chih-Chieh Lee, M. Postiff, T. Mudge\",\"doi\":\"10.1109/ICCD.1997.628854\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Per-address two-level branch predictors have been shown to be among the best predictors and have been implemented in current microprocessors. However, as the cycle time of modern microprocessors continues to decrease, the implementation of set-associative per-address two-level branch predictors will become more difficult. Instead, direct-mapped designs may be more attractive. In this paper, we investigate an alternative implementation of the per-address two-level predictor referred to as the tagless, direct-mapped predictor which is simpler and has faster access time. The tagless predictor can offer comparable performance to current set-associative designs since removal of tags allows more resources to be allocated for the predictor and branch target buffer (BTB). Removal of tags also decouples the per-address predictors from the BTB, thus allowing the two components to be optimized individually. Furthermore, our results show that this tagless implementation is more accurate because it handles conflict misses in the branch history table better. Finally, we examine the system cost-benefit for tagless per-address predictors across a wide design space using equal-cost contours. We study the sensitivity of performance to the workloads by comparing results from the Instruction Benchmark Suite (IBS) and SPEC CINT95. Our work provides principles and quantitative parameters for optimal configurations of such predictors.\",\"PeriodicalId\":154864,\"journal\":{\"name\":\"Proceedings International Conference on Computer Design VLSI in Computers and Processors\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings International Conference on Computer Design VLSI in Computers and Processors\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.1997.628854\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.1997.628854","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Design optimization for high-speed per-address two-level branch predictors
Per-address two-level branch predictors have been shown to be among the best predictors and have been implemented in current microprocessors. However, as the cycle time of modern microprocessors continues to decrease, the implementation of set-associative per-address two-level branch predictors will become more difficult. Instead, direct-mapped designs may be more attractive. In this paper, we investigate an alternative implementation of the per-address two-level predictor referred to as the tagless, direct-mapped predictor which is simpler and has faster access time. The tagless predictor can offer comparable performance to current set-associative designs since removal of tags allows more resources to be allocated for the predictor and branch target buffer (BTB). Removal of tags also decouples the per-address predictors from the BTB, thus allowing the two components to be optimized individually. Furthermore, our results show that this tagless implementation is more accurate because it handles conflict misses in the branch history table better. Finally, we examine the system cost-benefit for tagless per-address predictors across a wide design space using equal-cost contours. We study the sensitivity of performance to the workloads by comparing results from the Instruction Benchmark Suite (IBS) and SPEC CINT95. Our work provides principles and quantitative parameters for optimal configurations of such predictors.