Non-Sequential Instruction Cache Prefetching for Multiple-Issue Processors

A. Veidenbaum, Qing Zhao, Abduhl Shameer
{"title":"Non-Sequential Instruction Cache Prefetching for Multiple-Issue Processors","authors":"A. Veidenbaum, Qing Zhao, Abduhl Shameer","doi":"10.1142/S0129053399000065","DOIUrl":null,"url":null,"abstract":"This paper presents a novel instruction cache prefetching mechanism for multiple-issue processors. Such processors at high clock rates often have to use a small instruction cache which can have significant miss rates. Prefetching from secondary cache or even memory can hide the instruction cache miss penalties, but only if initiated sufficiently far ahead of the current program counter. Existing instruction cache prefetching methods are strictly sequential and do not prefetch past conditional branches which may occur almost every clock cycle in wide-issue processors. In this study, multi-level branch prediction is used to overcome this limitation. By keeping branch history and target addresses, two methods are defined to predict a future PC several branches past the current branch. A prefetching architecture using such a mechanism is defined and evaluated with respect to its accuracy, the impact of the instruction prefetching on performance, and its interaction with sequential prefetching. Both PC-based and history-based predictors are used to perform a single-lookup prediction. Targeting an on-chip L2 cache with low latency, prediction for 3 branch levels is evaluated for a 4-issue processor and cache architecture patterned after the DEC Alpha-21164. It is shown that history-based predictor is more accurate, but both predictors are effective. The prefetching unit using them can be effective and succeeds when the sequential prefetcher fails. In addition, non-sequential prefetching is better at hiding latency due to earlier initiation. The two types of prefetching eliminate different types of misses and thus can be effectively combined to achieve better performance.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. High Speed Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/S0129053399000065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

This paper presents a novel instruction cache prefetching mechanism for multiple-issue processors. Such processors at high clock rates often have to use a small instruction cache which can have significant miss rates. Prefetching from secondary cache or even memory can hide the instruction cache miss penalties, but only if initiated sufficiently far ahead of the current program counter. Existing instruction cache prefetching methods are strictly sequential and do not prefetch past conditional branches which may occur almost every clock cycle in wide-issue processors. In this study, multi-level branch prediction is used to overcome this limitation. By keeping branch history and target addresses, two methods are defined to predict a future PC several branches past the current branch. A prefetching architecture using such a mechanism is defined and evaluated with respect to its accuracy, the impact of the instruction prefetching on performance, and its interaction with sequential prefetching. Both PC-based and history-based predictors are used to perform a single-lookup prediction. Targeting an on-chip L2 cache with low latency, prediction for 3 branch levels is evaluated for a 4-issue processor and cache architecture patterned after the DEC Alpha-21164. It is shown that history-based predictor is more accurate, but both predictors are effective. The prefetching unit using them can be effective and succeeds when the sequential prefetcher fails. In addition, non-sequential prefetching is better at hiding latency due to earlier initiation. The two types of prefetching eliminate different types of misses and thus can be effectively combined to achieve better performance.
多问题处理器的非顺序指令缓存预取
提出了一种新的多任务处理器指令缓存预取机制。这种高时钟速率的处理器通常不得不使用一个小的指令缓存,这可能有显著的丢失率。从二级缓存甚至内存中预取可以隐藏指令缓存丢失的惩罚,但前提是要在当前程序计数器之前足够远的地方开始。现有的指令缓存预取方法是严格顺序的,不预取过去的条件分支,这在大问题处理器中几乎每个时钟周期都可能发生。在本研究中,采用多级分支预测来克服这一限制。通过保留分支历史和目标地址,定义了两种方法来预测未来PC机在当前分支之后的几个分支。定义了使用这种机制的预取架构,并对其准确性、指令预取对性能的影响以及与顺序预取的交互进行了评估。基于pc的预测器和基于历史的预测器都用于执行单个查找预测。针对具有低延迟的片上L2缓存,对4问题处理器和DEC Alpha-21164之后的缓存架构进行了3个分支级别的预测。结果表明,基于历史的预测方法更准确,但两种预测方法都是有效的。使用它们的预取单元可以在顺序预取失败时有效并成功。此外,由于初始化较早,非顺序预取可以更好地隐藏延迟。两种类型的预取消除了不同类型的脱靶,因此可以有效地组合在一起以获得更好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信