使用硬件事务性内存启用推测跟踪优化

Juan Salamanca, J. N. Amaral, G. Araújo
{"title":"使用硬件事务性内存启用推测跟踪优化","authors":"Juan Salamanca, J. N. Amaral, G. Araújo","doi":"10.1109/SBAC-PADW.2015.13","DOIUrl":null,"url":null,"abstract":"This paper describes a novel speculation technique for the optimization, and simultaneous execution, of multiple alternative traces of hot code regions. This technique, called Speculative Trace Optimization (STO), enumerates, optimizes, and speculatively executes traces of hot loops. It requires hardware support that can be provided in a similar fashion as that available in Hardware Transactional Memory (HTM) systems. This paper discusses the necessary features to support STO, namely multi-versioning, lazy conflict resolution, eager conflict detection, and transaction synchronization. A review of existing HTM architectures - Intel TSX, IBM BG/Q, and IBM POWER8 - shows that none of them have all the features required to implement STO. However, this work demonstrates that STO can be implemented on top of existing HTM architectures through the addition of privatization and pause/resume code. The evaluation of a prototype STO implementation, on top of Intel TSX, using benchmarks from Parboil, Media Bench, and SPEC2006, indicates that STO can yield whole-program speedups of up to 9%. This initial result is promising given that the prototype has significant overhead caused by the code that compensates for TSX absent features. An analysis, included in the paper, suggests that HTM mechanisms have the potential to considerably improve trace performance provided that they efficiently implement the suggested features.","PeriodicalId":161685,"journal":{"name":"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using Hardware Transactional Memory to Enable Speculative Trace Optimization\",\"authors\":\"Juan Salamanca, J. N. Amaral, G. Araújo\",\"doi\":\"10.1109/SBAC-PADW.2015.13\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes a novel speculation technique for the optimization, and simultaneous execution, of multiple alternative traces of hot code regions. This technique, called Speculative Trace Optimization (STO), enumerates, optimizes, and speculatively executes traces of hot loops. It requires hardware support that can be provided in a similar fashion as that available in Hardware Transactional Memory (HTM) systems. This paper discusses the necessary features to support STO, namely multi-versioning, lazy conflict resolution, eager conflict detection, and transaction synchronization. A review of existing HTM architectures - Intel TSX, IBM BG/Q, and IBM POWER8 - shows that none of them have all the features required to implement STO. However, this work demonstrates that STO can be implemented on top of existing HTM architectures through the addition of privatization and pause/resume code. The evaluation of a prototype STO implementation, on top of Intel TSX, using benchmarks from Parboil, Media Bench, and SPEC2006, indicates that STO can yield whole-program speedups of up to 9%. This initial result is promising given that the prototype has significant overhead caused by the code that compensates for TSX absent features. An analysis, included in the paper, suggests that HTM mechanisms have the potential to considerably improve trace performance provided that they efficiently implement the suggested features.\",\"PeriodicalId\":161685,\"journal\":{\"name\":\"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SBAC-PADW.2015.13\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PADW.2015.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文描述了一种新的推测技术,用于优化和同时执行热代码区域的多个备选跟踪。这种技术称为推测跟踪优化(STO),枚举、优化和推测性地执行热循环的跟踪。它需要硬件支持,这种支持可以以与硬件事务性内存(hardware Transactional Memory, HTM)系统类似的方式提供。本文讨论了支持STO的必要特性,即多版本、延迟冲突解决、急切冲突检测和事务同步。回顾一下现有的HTM体系结构——Intel TSX、IBM BG/Q和IBM POWER8——可以发现,它们都没有实现STO所需的全部特性。然而,这项工作表明,STO可以通过添加私有化和暂停/恢复代码,在现有的HTM架构之上实现。在英特尔TSX之上,使用Parboil、Media Bench和SPEC2006的基准测试,对原型STO实现的评估表明,STO可以产生高达9%的整个程序加速。这个最初的结果是有希望的,因为原型有很大的开销,这是由补偿TSX缺失特性的代码造成的。本文中包含的一项分析表明,HTM机制具有显著提高跟踪性能的潜力,只要它们有效地实现了所建议的特性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Using Hardware Transactional Memory to Enable Speculative Trace Optimization
This paper describes a novel speculation technique for the optimization, and simultaneous execution, of multiple alternative traces of hot code regions. This technique, called Speculative Trace Optimization (STO), enumerates, optimizes, and speculatively executes traces of hot loops. It requires hardware support that can be provided in a similar fashion as that available in Hardware Transactional Memory (HTM) systems. This paper discusses the necessary features to support STO, namely multi-versioning, lazy conflict resolution, eager conflict detection, and transaction synchronization. A review of existing HTM architectures - Intel TSX, IBM BG/Q, and IBM POWER8 - shows that none of them have all the features required to implement STO. However, this work demonstrates that STO can be implemented on top of existing HTM architectures through the addition of privatization and pause/resume code. The evaluation of a prototype STO implementation, on top of Intel TSX, using benchmarks from Parboil, Media Bench, and SPEC2006, indicates that STO can yield whole-program speedups of up to 9%. This initial result is promising given that the prototype has significant overhead caused by the code that compensates for TSX absent features. An analysis, included in the paper, suggests that HTM mechanisms have the potential to considerably improve trace performance provided that they efficiently implement the suggested features.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信