使用硬件事务性内存启用推测跟踪优化

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW) Pub Date : 2015-10-18 DOI:10.1109/SBAC-PADW.2015.13

Juan Salamanca, J. N. Amaral, G. Araújo

{"title":"使用硬件事务性内存启用推测跟踪优化","authors":"Juan Salamanca, J. N. Amaral, G. Araújo","doi":"10.1109/SBAC-PADW.2015.13","DOIUrl":null,"url":null,"abstract":"This paper describes a novel speculation technique for the optimization, and simultaneous execution, of multiple alternative traces of hot code regions. This technique, called Speculative Trace Optimization (STO), enumerates, optimizes, and speculatively executes traces of hot loops. It requires hardware support that can be provided in a similar fashion as that available in Hardware Transactional Memory (HTM) systems. This paper discusses the necessary features to support STO, namely multi-versioning, lazy conflict resolution, eager conflict detection, and transaction synchronization. A review of existing HTM architectures - Intel TSX, IBM BG/Q, and IBM POWER8 - shows that none of them have all the features required to implement STO. However, this work demonstrates that STO can be implemented on top of existing HTM architectures through the addition of privatization and pause/resume code. The evaluation of a prototype STO implementation, on top of Intel TSX, using benchmarks from Parboil, Media Bench, and SPEC2006, indicates that STO can yield whole-program speedups of up to 9%. This initial result is promising given that the prototype has significant overhead caused by the code that compensates for TSX absent features. An analysis, included in the paper, suggests that HTM mechanisms have the potential to considerably improve trace performance provided that they efficiently implement the suggested features.","PeriodicalId":161685,"journal":{"name":"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using Hardware Transactional Memory to Enable Speculative Trace Optimization\",\"authors\":\"Juan Salamanca, J. N. Amaral, G. Araújo\",\"doi\":\"10.1109/SBAC-PADW.2015.13\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes a novel speculation technique for the optimization, and simultaneous execution, of multiple alternative traces of hot code regions. This technique, called Speculative Trace Optimization (STO), enumerates, optimizes, and speculatively executes traces of hot loops. It requires hardware support that can be provided in a similar fashion as that available in Hardware Transactional Memory (HTM) systems. This paper discusses the necessary features to support STO, namely multi-versioning, lazy conflict resolution, eager conflict detection, and transaction synchronization. A review of existing HTM architectures - Intel TSX, IBM BG/Q, and IBM POWER8 - shows that none of them have all the features required to implement STO. However, this work demonstrates that STO can be implemented on top of existing HTM architectures through the addition of privatization and pause/resume code. The evaluation of a prototype STO implementation, on top of Intel TSX, using benchmarks from Parboil, Media Bench, and SPEC2006, indicates that STO can yield whole-program speedups of up to 9%. This initial result is promising given that the prototype has significant overhead caused by the code that compensates for TSX absent features. An analysis, included in the paper, suggests that HTM mechanisms have the potential to considerably improve trace performance provided that they efficiently implement the suggested features.\",\"PeriodicalId\":161685,\"journal\":{\"name\":\"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SBAC-PADW.2015.13\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PADW.2015.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文描述了一种新的推测技术，用于优化和同时执行热代码区域的多个备选跟踪。这种技术称为推测跟踪优化(STO)，枚举、优化和推测性地执行热循环的跟踪。它需要硬件支持，这种支持可以以与硬件事务性内存(hardware Transactional Memory, HTM)系统类似的方式提供。本文讨论了支持STO的必要特性，即多版本、延迟冲突解决、急切冲突检测和事务同步。回顾一下现有的HTM体系结构——Intel TSX、IBM BG/Q和IBM POWER8——可以发现，它们都没有实现STO所需的全部特性。然而，这项工作表明，STO可以通过添加私有化和暂停/恢复代码，在现有的HTM架构之上实现。在英特尔TSX之上，使用Parboil、Media Bench和SPEC2006的基准测试，对原型STO实现的评估表明，STO可以产生高达9%的整个程序加速。这个最初的结果是有希望的，因为原型有很大的开销，这是由补偿TSX缺失特性的代码造成的。本文中包含的一项分析表明，HTM机制具有显著提高跟踪性能的潜力，只要它们有效地实现了所建议的特性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Using Hardware Transactional Memory to Enable Speculative Trace Optimization

This paper describes a novel speculation technique for the optimization, and simultaneous execution, of multiple alternative traces of hot code regions. This technique, called Speculative Trace Optimization (STO), enumerates, optimizes, and speculatively executes traces of hot loops. It requires hardware support that can be provided in a similar fashion as that available in Hardware Transactional Memory (HTM) systems. This paper discusses the necessary features to support STO, namely multi-versioning, lazy conflict resolution, eager conflict detection, and transaction synchronization. A review of existing HTM architectures - Intel TSX, IBM BG/Q, and IBM POWER8 - shows that none of them have all the features required to implement STO. However, this work demonstrates that STO can be implemented on top of existing HTM architectures through the addition of privatization and pause/resume code. The evaluation of a prototype STO implementation, on top of Intel TSX, using benchmarks from Parboil, Media Bench, and SPEC2006, indicates that STO can yield whole-program speedups of up to 9%. This initial result is promising given that the prototype has significant overhead caused by the code that compensates for TSX absent features. An analysis, included in the paper, suggests that HTM mechanisms have the potential to considerably improve trace performance provided that they efficiently implement the suggested features.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)

自引率

0.00%

发文量