2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)最新文献

筛选

英文中文

FGSCM: A Fine-Grained Approach to Transactional Lock Elision FGSCM:事务锁省略的细粒度方法

2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2017-10-01 DOI: 10.1109/SBAC-PAD.2017.22

Gustavo José de Sousa, A. Baldassin

{"title":"FGSCM: A Fine-Grained Approach to Transactional Lock Elision","authors":"Gustavo José de Sousa, A. Baldassin","doi":"10.1109/SBAC-PAD.2017.22","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2017.22","url":null,"abstract":"Speculative Lock Elision (SLE) is a technique that allows critical sections to be executed optimistically by eliding the lock operation and enabling multiple threads to execute concurrently. In case of inconsistencies, the hardware automatically rolls back the execution and pessimistically acquires the original lock during runtime. The decision to elide the lock in SLE is performed transparently at the microarchitecture level and, although being convenient, it may sometimes hurt performance. To avoid that case, researchers have investigated Transactional Lock Elision (TLE), in which software-controlled hardware transactions are used instead, allowing the creation of policies and heuristics to manage lock elision. Typical implementations of TLE make use of a single lock to serialize the execution in case the original lock cannot be elided, which can potentially degrade performance. In order to improve on such cases, this paper proposes the Fine-Grained Software-assisted Conflict Management (FGSCM) scheme, a TLE technique that employs multiple locks so as to avoid unnecessary serialization of the code. The main idea of FGSCM is that not all threads that conflict inside a critical section are acessing the same region of shared memory. By automatically assigning distinct locks to these threads according to the memory section they access, the level of concurrency can be increased. In this paper we formalize FGSCM and provide an in-depth performance evaluation using a microbenchmark to stress several conflict behaviors. Our initial results with a prototype implementation using Intels Restricted Transactional Memory (RTM) are encouraging. With a quadcore machine, we observed an average performance gain of 11% compared to the single-auxiliary-lock SCM and 36% compared to a standard lock scheme, both for typical read-dominated workloads.","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121311886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Publish/Subscribe System Using Causal Broadcast over Dynamically Built Spanning Trees 动态生成树上使用因果广播的发布/订阅系统

2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2017-06-26 DOI: 10.1109/SBAC-PAD.2017.28

J. Araujo, L. Arantes, E. P. Duarte, L. A. Rodrigues, Pierre Sens

引用次数: 9

Object Placement for High Bandwidth Memory Augmented with High Capacity Memory 用高容量存储器增强高带宽存储器的对象放置

2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 1900-01-01 DOI: 10.1109/SBAC-PAD.2017.24

M. Laghari, D. Unat

{"title":"Object Placement for High Bandwidth Memory Augmented with High Capacity Memory","authors":"M. Laghari, D. Unat","doi":"10.1109/SBAC-PAD.2017.24","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2017.24","url":null,"abstract":"High bandwidth memory (HBM) is a new emerging technology that aims to improve the performance of bandwidth limited applications. Even though it provides high bandwidth, it must be augmented with DRAM to meet the memory capacity requirement of any applications. Due to this limitation, objects in an application should be optimally placed on the heterogeneous memory subsystems. In this study, we propose an object placement algorithm that places program objects to fast or slow memories in case the capacity of fast memory is insufficient to hold all the objects to increase the overall application performance. Our algorithm uses the reference counts and type of references (read or write) to make an initial placement of data. In addition, we perform various memory bandwidth benchmarks to be used in our placement algorithm on Intel Knights Landing (KNL) architecture. Not surprisingly high bandwidth memory sustains higher read bandwidth than write bandwidth, however, placing write-intensive data on HBM results in better overall performance because write-intensive data is punished by the DRAM speed more severely compared to read intensive data. Moreover, our benchmarks demonstrate that if a basic block makes references to both types of memories, it performs worse than if it makes references to only one type of memory in some cases. We test our proposed placement algorithm with 6 applications under various system configurations. By allocating objects according to our placement scheme, we are able to achieve a speedup of up to 2x.","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127046654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

首页上一页