{"title":"使用硬件事务性内存启用推测跟踪优化","authors":"Juan Salamanca, J. N. Amaral, G. Araújo","doi":"10.1109/SBAC-PADW.2015.13","DOIUrl":null,"url":null,"abstract":"This paper describes a novel speculation technique for the optimization, and simultaneous execution, of multiple alternative traces of hot code regions. This technique, called Speculative Trace Optimization (STO), enumerates, optimizes, and speculatively executes traces of hot loops. It requires hardware support that can be provided in a similar fashion as that available in Hardware Transactional Memory (HTM) systems. This paper discusses the necessary features to support STO, namely multi-versioning, lazy conflict resolution, eager conflict detection, and transaction synchronization. A review of existing HTM architectures - Intel TSX, IBM BG/Q, and IBM POWER8 - shows that none of them have all the features required to implement STO. However, this work demonstrates that STO can be implemented on top of existing HTM architectures through the addition of privatization and pause/resume code. The evaluation of a prototype STO implementation, on top of Intel TSX, using benchmarks from Parboil, Media Bench, and SPEC2006, indicates that STO can yield whole-program speedups of up to 9%. This initial result is promising given that the prototype has significant overhead caused by the code that compensates for TSX absent features. An analysis, included in the paper, suggests that HTM mechanisms have the potential to considerably improve trace performance provided that they efficiently implement the suggested features.","PeriodicalId":161685,"journal":{"name":"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using Hardware Transactional Memory to Enable Speculative Trace Optimization\",\"authors\":\"Juan Salamanca, J. N. Amaral, G. Araújo\",\"doi\":\"10.1109/SBAC-PADW.2015.13\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes a novel speculation technique for the optimization, and simultaneous execution, of multiple alternative traces of hot code regions. This technique, called Speculative Trace Optimization (STO), enumerates, optimizes, and speculatively executes traces of hot loops. It requires hardware support that can be provided in a similar fashion as that available in Hardware Transactional Memory (HTM) systems. This paper discusses the necessary features to support STO, namely multi-versioning, lazy conflict resolution, eager conflict detection, and transaction synchronization. A review of existing HTM architectures - Intel TSX, IBM BG/Q, and IBM POWER8 - shows that none of them have all the features required to implement STO. However, this work demonstrates that STO can be implemented on top of existing HTM architectures through the addition of privatization and pause/resume code. The evaluation of a prototype STO implementation, on top of Intel TSX, using benchmarks from Parboil, Media Bench, and SPEC2006, indicates that STO can yield whole-program speedups of up to 9%. This initial result is promising given that the prototype has significant overhead caused by the code that compensates for TSX absent features. An analysis, included in the paper, suggests that HTM mechanisms have the potential to considerably improve trace performance provided that they efficiently implement the suggested features.\",\"PeriodicalId\":161685,\"journal\":{\"name\":\"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SBAC-PADW.2015.13\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PADW.2015.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using Hardware Transactional Memory to Enable Speculative Trace Optimization
This paper describes a novel speculation technique for the optimization, and simultaneous execution, of multiple alternative traces of hot code regions. This technique, called Speculative Trace Optimization (STO), enumerates, optimizes, and speculatively executes traces of hot loops. It requires hardware support that can be provided in a similar fashion as that available in Hardware Transactional Memory (HTM) systems. This paper discusses the necessary features to support STO, namely multi-versioning, lazy conflict resolution, eager conflict detection, and transaction synchronization. A review of existing HTM architectures - Intel TSX, IBM BG/Q, and IBM POWER8 - shows that none of them have all the features required to implement STO. However, this work demonstrates that STO can be implemented on top of existing HTM architectures through the addition of privatization and pause/resume code. The evaluation of a prototype STO implementation, on top of Intel TSX, using benchmarks from Parboil, Media Bench, and SPEC2006, indicates that STO can yield whole-program speedups of up to 9%. This initial result is promising given that the prototype has significant overhead caused by the code that compensates for TSX absent features. An analysis, included in the paper, suggests that HTM mechanisms have the potential to considerably improve trace performance provided that they efficiently implement the suggested features.