Non-Sequential Instruction Cache Prefetching for Multiple-Issue Processors

Int. J. High Speed Comput. Pub Date : 1999-03-01 DOI:10.1142/S0129053399000065

A. Veidenbaum, Qing Zhao, Abduhl Shameer

{"title":"Non-Sequential Instruction Cache Prefetching for Multiple-Issue Processors","authors":"A. Veidenbaum, Qing Zhao, Abduhl Shameer","doi":"10.1142/S0129053399000065","DOIUrl":null,"url":null,"abstract":"This paper presents a novel instruction cache prefetching mechanism for multiple-issue processors. Such processors at high clock rates often have to use a small instruction cache which can have significant miss rates. Prefetching from secondary cache or even memory can hide the instruction cache miss penalties, but only if initiated sufficiently far ahead of the current program counter. Existing instruction cache prefetching methods are strictly sequential and do not prefetch past conditional branches which may occur almost every clock cycle in wide-issue processors. In this study, multi-level branch prediction is used to overcome this limitation. By keeping branch history and target addresses, two methods are defined to predict a future PC several branches past the current branch. A prefetching architecture using such a mechanism is defined and evaluated with respect to its accuracy, the impact of the instruction prefetching on performance, and its interaction with sequential prefetching. Both PC-based and history-based predictors are used to perform a single-lookup prediction. Targeting an on-chip L2 cache with low latency, prediction for 3 branch levels is evaluated for a 4-issue processor and cache architecture patterned after the DEC Alpha-21164. It is shown that history-based predictor is more accurate, but both predictors are effective. The prefetching unit using them can be effective and succeeds when the sequential prefetcher fails. In addition, non-sequential prefetching is better at hiding latency due to earlier initiation. The two types of prefetching eliminate different types of misses and thus can be effectively combined to achieve better performance.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. High Speed Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/S0129053399000065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

This paper presents a novel instruction cache prefetching mechanism for multiple-issue processors. Such processors at high clock rates often have to use a small instruction cache which can have significant miss rates. Prefetching from secondary cache or even memory can hide the instruction cache miss penalties, but only if initiated sufficiently far ahead of the current program counter. Existing instruction cache prefetching methods are strictly sequential and do not prefetch past conditional branches which may occur almost every clock cycle in wide-issue processors. In this study, multi-level branch prediction is used to overcome this limitation. By keeping branch history and target addresses, two methods are defined to predict a future PC several branches past the current branch. A prefetching architecture using such a mechanism is defined and evaluated with respect to its accuracy, the impact of the instruction prefetching on performance, and its interaction with sequential prefetching. Both PC-based and history-based predictors are used to perform a single-lookup prediction. Targeting an on-chip L2 cache with low latency, prediction for 3 branch levels is evaluated for a 4-issue processor and cache architecture patterned after the DEC Alpha-21164. It is shown that history-based predictor is more accurate, but both predictors are effective. The prefetching unit using them can be effective and succeeds when the sequential prefetcher fails. In addition, non-sequential prefetching is better at hiding latency due to earlier initiation. The two types of prefetching eliminate different types of misses and thus can be effectively combined to achieve better performance.

查看原文本刊更多论文

多问题处理器的非顺序指令缓存预取

提出了一种新的多任务处理器指令缓存预取机制。这种高时钟速率的处理器通常不得不使用一个小的指令缓存，这可能有显著的丢失率。从二级缓存甚至内存中预取可以隐藏指令缓存丢失的惩罚，但前提是要在当前程序计数器之前足够远的地方开始。现有的指令缓存预取方法是严格顺序的，不预取过去的条件分支，这在大问题处理器中几乎每个时钟周期都可能发生。在本研究中，采用多级分支预测来克服这一限制。通过保留分支历史和目标地址，定义了两种方法来预测未来PC机在当前分支之后的几个分支。定义了使用这种机制的预取架构，并对其准确性、指令预取对性能的影响以及与顺序预取的交互进行了评估。基于pc的预测器和基于历史的预测器都用于执行单个查找预测。针对具有低延迟的片上L2缓存，对4问题处理器和DEC Alpha-21164之后的缓存架构进行了3个分支级别的预测。结果表明，基于历史的预测方法更准确，但两种预测方法都是有效的。使用它们的预取单元可以在顺序预取失败时有效并成功。此外，由于初始化较早，非顺序预取可以更好地隐藏延迟。两种类型的预取消除了不同类型的脱靶，因此可以有效地组合在一起以获得更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Int. J. High Speed Comput.

自引率

0.00%

发文量