使用多路径执行处理TLS系统中的分支

HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture Pub Date : 2010-04-01 DOI:10.1109/HPCA.2010.5416632

Polychronis Xekalakis, Marcelo H. Cintra

{"title":"使用多路径执行处理TLS系统中的分支","authors":"Polychronis Xekalakis, Marcelo H. Cintra","doi":"10.1109/HPCA.2010.5416632","DOIUrl":null,"url":null,"abstract":"Thread-Level Speculation (TLS) has been proposed to facilitate the extraction of parallel threads from sequential applications. Most prior work on TLS has focused on architectural features directly related to supporting the main TLS operations. In this work we, instead, investigate how a common microarchitectural feature, namely branch prediction, interacts with TLS. We show that branch prediction for TLS is even more important than it is for sequential execution. Unfortunately, branch prediction for TLS systems is also inherently harder. Code partitioning and re-executions of squashed threads pollute the branch history making it harder for predictors to be accurate. We thus propose to augment the hardware, so as to accommodate Multi-Path Execution (MP) within the existing TLS protocol. Under the MP execution model, all paths following a number of hard-to-predict conditional branches are followed simultaneously. MP execution thus removes branches that would have been otherwise mispredicted, helping in this way the core to exploit more ILP. We show that, with only minimal hardware support, one can combine these two execution models into a unified one. Experimental results show that our combined execution model achieves speedups of up to 23.2%, with an average of 9.2%, over an existing state-of-the-art TLS system and speedups of up to 138 %, with an average of 28.2%, when compared with MP execution for a subset of the SPEC2000 Int benchmark suite.","PeriodicalId":368621,"journal":{"name":"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture","volume":"124 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Handling branches in TLS systems with Multi-Path Execution\",\"authors\":\"Polychronis Xekalakis, Marcelo H. Cintra\",\"doi\":\"10.1109/HPCA.2010.5416632\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Thread-Level Speculation (TLS) has been proposed to facilitate the extraction of parallel threads from sequential applications. Most prior work on TLS has focused on architectural features directly related to supporting the main TLS operations. In this work we, instead, investigate how a common microarchitectural feature, namely branch prediction, interacts with TLS. We show that branch prediction for TLS is even more important than it is for sequential execution. Unfortunately, branch prediction for TLS systems is also inherently harder. Code partitioning and re-executions of squashed threads pollute the branch history making it harder for predictors to be accurate. We thus propose to augment the hardware, so as to accommodate Multi-Path Execution (MP) within the existing TLS protocol. Under the MP execution model, all paths following a number of hard-to-predict conditional branches are followed simultaneously. MP execution thus removes branches that would have been otherwise mispredicted, helping in this way the core to exploit more ILP. We show that, with only minimal hardware support, one can combine these two execution models into a unified one. Experimental results show that our combined execution model achieves speedups of up to 23.2%, with an average of 9.2%, over an existing state-of-the-art TLS system and speedups of up to 138 %, with an average of 28.2%, when compared with MP execution for a subset of the SPEC2000 Int benchmark suite.\",\"PeriodicalId\":368621,\"journal\":{\"name\":\"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture\",\"volume\":\"124 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2010.5416632\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2010.5416632","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

为了方便从顺序应用程序中提取并行线程，提出了线程级推测(TLS)。之前关于TLS的大部分工作都集中在与支持主要TLS操作直接相关的架构特性上。在这项工作中，我们研究了一个常见的微架构特征，即分支预测，是如何与TLS交互的。我们表明，TLS的分支预测甚至比顺序执行更重要。不幸的是，TLS系统的分支预测本身也比较困难。代码分区和被压缩线程的重新执行会污染分支历史，从而使预测器更加难以准确。因此，我们建议增加硬件，以便在现有的TLS协议中容纳多路径执行(MP)。在MP执行模型下，同时遵循许多难以预测的条件分支之后的所有路径。因此，MP执行删除了可能被错误预测的分支，以这种方式帮助核心利用更多的ILP。我们展示了，只需最少的硬件支持，就可以将这两种执行模型组合成一个统一的模型。实验结果表明，与现有最先进的TLS系统相比，我们的组合执行模型实现了高达23.2%的加速，平均速度为9.2%;与SPEC2000 Int基准套件子集的MP执行相比，我们的组合执行模型实现了高达138%的加速，平均速度为28.2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Handling branches in TLS systems with Multi-Path Execution

Thread-Level Speculation (TLS) has been proposed to facilitate the extraction of parallel threads from sequential applications. Most prior work on TLS has focused on architectural features directly related to supporting the main TLS operations. In this work we, instead, investigate how a common microarchitectural feature, namely branch prediction, interacts with TLS. We show that branch prediction for TLS is even more important than it is for sequential execution. Unfortunately, branch prediction for TLS systems is also inherently harder. Code partitioning and re-executions of squashed threads pollute the branch history making it harder for predictors to be accurate. We thus propose to augment the hardware, so as to accommodate Multi-Path Execution (MP) within the existing TLS protocol. Under the MP execution model, all paths following a number of hard-to-predict conditional branches are followed simultaneously. MP execution thus removes branches that would have been otherwise mispredicted, helping in this way the core to exploit more ILP. We show that, with only minimal hardware support, one can combine these two execution models into a unified one. Experimental results show that our combined execution model achieves speedups of up to 23.2%, with an average of 9.2%, over an existing state-of-the-art TLS system and speedups of up to 138 %, with an average of 28.2%, when compared with MP execution for a subset of the SPEC2000 Int benchmark suite.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture

自引率

0.00%

发文量