BeBoP: A cost effective predictor infrastructure for superscalar value prediction

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2015-02-01 DOI:10.1109/HPCA.2015.7056018

Arthur Perais, André Seznec

{"title":"BeBoP: A cost effective predictor infrastructure for superscalar value prediction","authors":"Arthur Perais, André Seznec","doi":"10.1109/HPCA.2015.7056018","DOIUrl":null,"url":null,"abstract":"Up to recently, it was considered that a performance-effective implementation of Value Prediction (VP) would add tremendous complexity and power consumption in the pipeline, especially in the Out-of-Order engine and the predictor infrastructure. Despite recent progress in the field of Value Prediction, this remains partially true. Indeed, if the recent EOLE architecture proposition suggests that the OoO engine need not be altered to accommodate VP, complexity in the predictor infrastructure itself is still problematic. First, multiple predictions must be generated each cycle, but multi-ported structures should be avoided. Second, the predictor should be small enough to be considered for implementation, yet coverage must remain high enough to increase performance. To address these remaining concerns, we first propose a block-based value prediction scheme mimicking current instruction fetch mechanisms, BeBoP. It associates the predicted values with a fetch block rather than distinct instructions. Second, to remedy the storage issue, we present the Differential VTAGE predictor. This new tightly coupled hybrid predictor covers instructions predictable by both VTAGE and Stride-based value predictors, and its hardware cost and complexity can be made similar to those of a modern branch predictor. Third, we show that block-based value prediction allows to implement the checkpointing mechanism needed to provide D-VTAGE with last computed/predicted values at moderate cost. Overall, we establish that EOLE with a 32.8KB block-based D-VTAGE predictor and a 4-issue OoO engine can significantly outperform a baseline 6-issue superscalar processor, by up to 62.2% and 11.2% on average (gmean), on our benchmark set.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"23 1","pages":"13-25"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2015.7056018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 38

Abstract

Up to recently, it was considered that a performance-effective implementation of Value Prediction (VP) would add tremendous complexity and power consumption in the pipeline, especially in the Out-of-Order engine and the predictor infrastructure. Despite recent progress in the field of Value Prediction, this remains partially true. Indeed, if the recent EOLE architecture proposition suggests that the OoO engine need not be altered to accommodate VP, complexity in the predictor infrastructure itself is still problematic. First, multiple predictions must be generated each cycle, but multi-ported structures should be avoided. Second, the predictor should be small enough to be considered for implementation, yet coverage must remain high enough to increase performance. To address these remaining concerns, we first propose a block-based value prediction scheme mimicking current instruction fetch mechanisms, BeBoP. It associates the predicted values with a fetch block rather than distinct instructions. Second, to remedy the storage issue, we present the Differential VTAGE predictor. This new tightly coupled hybrid predictor covers instructions predictable by both VTAGE and Stride-based value predictors, and its hardware cost and complexity can be made similar to those of a modern branch predictor. Third, we show that block-based value prediction allows to implement the checkpointing mechanism needed to provide D-VTAGE with last computed/predicted values at moderate cost. Overall, we establish that EOLE with a 32.8KB block-based D-VTAGE predictor and a 4-issue OoO engine can significantly outperform a baseline 6-issue superscalar processor, by up to 62.2% and 11.2% on average (gmean), on our benchmark set.

查看原文本刊更多论文

BeBoP:用于超标量值预测的经济有效的预测器基础结构

到目前为止，人们认为价值预测(VP)的性能有效实现会在管道中增加巨大的复杂性和功耗，特别是在无序引擎和预测器基础设施中。尽管最近在价值预测领域取得了进展，但这仍然是部分正确的。实际上，如果最近的EOLE体系结构建议不需要修改oo引擎来适应VP，那么预测器基础结构本身的复杂性仍然是有问题的。首先，每个周期必须生成多个预测，但应避免多端口结构。其次，预测器应该足够小，以便考虑实现，但覆盖率必须保持足够高，以提高性能。为了解决这些剩余的问题，我们首先提出了一个基于块的值预测方案，模仿当前的指令获取机制，BeBoP。它将预测值与获取块(而不是不同的指令)关联起来。其次，为了解决存储问题，我们提出了差分电压预测器。这种新的紧密耦合混合预测器涵盖了VTAGE和基于stride的值预测器可预测的指令，其硬件成本和复杂性可以与现代分支预测器相似。第三，我们表明基于块的值预测允许以中等成本实现为D-VTAGE提供最后计算/预测值所需的检查点机制。总体而言，在我们的基准集上，我们建立了具有32.8KB基于块的D-VTAGE预测器和4问题OoO引擎的EOLE可以显着优于基准6问题标量处理器，平均(gmean)最高可达62.2%和11.2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量