Increasing the energy efficiency of TLS systems using intermediate checkpointing

2011 18th International Conference on High Performance Computing Pub Date : 2011-12-18 DOI:10.1109/HiPC.2011.6152735

Salman Khan, Nikolas Ioannou, Polychronis Xekalakis, Marcelo H. Cintra

{"title":"Increasing the energy efficiency of TLS systems using intermediate checkpointing","authors":"Salman Khan, Nikolas Ioannou, Polychronis Xekalakis, Marcelo H. Cintra","doi":"10.1109/HiPC.2011.6152735","DOIUrl":null,"url":null,"abstract":"With the advent of Chip Multiprocessors (CMPs), improving performance relies on the programmers/compilers to expose thread level parallelism to the underlying hardware. However, this is a difficult and error-prone process for the programmers, while state of the art compiler techniques are unable to provide significant benefits for many classes of applications. An alternative is offered by systems that support Thread Level Speculation (TLS), which relieve the programmer and compiler from checking for thread dependences and instead use the hardware to enforce them. Unfortunately, TLS suffers from power inefficency because data misspeculations cause threads to roll back to the beginning of the speculative task. For this reason intermediate check-pointing of TLS threads has been proposed. When a violation does occur, we now have to roll back to a checkpoint before the violating instruction and not to the start of the task. However, previous work omits study of the microarchitectural details and implementation issues that are essential for effective checkpointing. In this paper we study checkpointing on a state-of-the art TLS system. We systematically study the costs associated with checkpointing and analyze the tradeoffs. We also propose changes to the TLS mechanism to allow effective checkpointing. Further, we establish the need for accurately identifying points in execution that are appropriate for checkpointing and analyze various techniques for doing so in terms of both effectiveness and viability. We propose program counter based and hybrid predictors and show that they outperform previous proposals. Placing checkpoints based on dependence predictors results in power improvements while maintaining the performance advantage of TLS. The checkpointing system proposed achieves an energy saving of up to 14%, with an average of 7% over normal TLS execution.","PeriodicalId":122468,"journal":{"name":"2011 18th International Conference on High Performance Computing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 18th International Conference on High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC.2011.6152735","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

With the advent of Chip Multiprocessors (CMPs), improving performance relies on the programmers/compilers to expose thread level parallelism to the underlying hardware. However, this is a difficult and error-prone process for the programmers, while state of the art compiler techniques are unable to provide significant benefits for many classes of applications. An alternative is offered by systems that support Thread Level Speculation (TLS), which relieve the programmer and compiler from checking for thread dependences and instead use the hardware to enforce them. Unfortunately, TLS suffers from power inefficency because data misspeculations cause threads to roll back to the beginning of the speculative task. For this reason intermediate check-pointing of TLS threads has been proposed. When a violation does occur, we now have to roll back to a checkpoint before the violating instruction and not to the start of the task. However, previous work omits study of the microarchitectural details and implementation issues that are essential for effective checkpointing. In this paper we study checkpointing on a state-of-the art TLS system. We systematically study the costs associated with checkpointing and analyze the tradeoffs. We also propose changes to the TLS mechanism to allow effective checkpointing. Further, we establish the need for accurately identifying points in execution that are appropriate for checkpointing and analyze various techniques for doing so in terms of both effectiveness and viability. We propose program counter based and hybrid predictors and show that they outperform previous proposals. Placing checkpoints based on dependence predictors results in power improvements while maintaining the performance advantage of TLS. The checkpointing system proposed achieves an energy saving of up to 14%, with an average of 7% over normal TLS execution.

查看原文本刊更多论文

利用中间检查点提高TLS系统的能源效率

随着芯片多处理器(cmp)的出现，提高性能依赖于程序员/编译器向底层硬件公开线程级别的并行性。然而，对于程序员来说，这是一个困难且容易出错的过程，而最先进的编译器技术无法为许多应用程序类提供显著的好处。支持线程级别推测(TLS)的系统提供了另一种选择，它使程序员和编译器不必检查线程依赖性，而是使用硬件来强制执行它们。不幸的是，由于数据的错误推测会导致线程回滚到推测任务的开始，因此TLS存在低功耗问题。为此，提出了TLS线程的中间检查点。当违规发生时，我们现在必须回滚到违规指令之前的检查点，而不是任务的开始。然而，以前的工作忽略了对微架构细节和实现问题的研究，而这些对于有效的检查点是必不可少的。在本文中，我们研究了一个最先进的TLS系统的检查点。我们系统地研究了与检查点相关的成本，并分析了权衡。我们还建议对TLS机制进行更改，以允许有效的检查点。此外，我们建立了在执行中准确识别适合检查点的点的需求，并从有效性和可行性两方面分析了用于这样做的各种技术。我们提出了基于程序计数器和混合预测器，并表明它们优于以前的建议。基于依赖性预测器放置检查点可以提高功耗，同时保持TLS的性能优势。所提出的检查点系统实现了高达14%的节能，比普通TLS执行平均节能7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 18th International Conference on High Performance Computing

自引率

0.00%

发文量