2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)最新文献

筛选
英文 中文
FGSCM: A Fine-Grained Approach to Transactional Lock Elision FGSCM:事务锁省略的细粒度方法
Gustavo José de Sousa, A. Baldassin
{"title":"FGSCM: A Fine-Grained Approach to Transactional Lock Elision","authors":"Gustavo José de Sousa, A. Baldassin","doi":"10.1109/SBAC-PAD.2017.22","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2017.22","url":null,"abstract":"Speculative Lock Elision (SLE) is a technique that allows critical sections to be executed optimistically by eliding the lock operation and enabling multiple threads to execute concurrently. In case of inconsistencies, the hardware automatically rolls back the execution and pessimistically acquires the original lock during runtime. The decision to elide the lock in SLE is performed transparently at the microarchitecture level and, although being convenient, it may sometimes hurt performance. To avoid that case, researchers have investigated Transactional Lock Elision (TLE), in which software-controlled hardware transactions are used instead, allowing the creation of policies and heuristics to manage lock elision. Typical implementations of TLE make use of a single lock to serialize the execution in case the original lock cannot be elided, which can potentially degrade performance. In order to improve on such cases, this paper proposes the Fine-Grained Software-assisted Conflict Management (FGSCM) scheme, a TLE technique that employs multiple locks so as to avoid unnecessary serialization of the code. The main idea of FGSCM is that not all threads that conflict inside a critical section are acessing the same region of shared memory. By automatically assigning distinct locks to these threads according to the memory section they access, the level of concurrency can be increased. In this paper we formalize FGSCM and provide an in-depth performance evaluation using a microbenchmark to stress several conflict behaviors. Our initial results with a prototype implementation using Intels Restricted Transactional Memory (RTM) are encouraging. With a quadcore machine, we observed an average performance gain of 11% compared to the single-auxiliary-lock SCM and 36% compared to a standard lock scheme, both for typical read-dominated workloads.","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121311886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Publish/Subscribe System Using Causal Broadcast over Dynamically Built Spanning Trees 动态生成树上使用因果广播的发布/订阅系统
J. Araujo, L. Arantes, E. P. Duarte, L. A. Rodrigues, Pierre Sens
{"title":"A Publish/Subscribe System Using Causal Broadcast over Dynamically Built Spanning Trees","authors":"J. Araujo, L. Arantes, E. P. Duarte, L. A. Rodrigues, Pierre Sens","doi":"10.1109/SBAC-PAD.2017.28","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2017.28","url":null,"abstract":"In this paper we present VCube-PS, a topic-based Publish/Subscribe system built on the top of a virtual hypercubelike topology. Membership information and published messages to subscribers (members) of a topic group are broadcast over dynamically built spanning trees rooted at the message’s source. For a given topic, delivery of published messages respects causal order. Performance results of experiments conducted on the PeerSim simulator confirm the efficiency of VCube-PS in terms of scalability, latency, number, and size of messages when compared to a single rooted, not dynamically, tree built approach.","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125022867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Object Placement for High Bandwidth Memory Augmented with High Capacity Memory 用高容量存储器增强高带宽存储器的对象放置
M. Laghari, D. Unat
{"title":"Object Placement for High Bandwidth Memory Augmented with High Capacity Memory","authors":"M. Laghari, D. Unat","doi":"10.1109/SBAC-PAD.2017.24","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2017.24","url":null,"abstract":"High bandwidth memory (HBM) is a new emerging technology that aims to improve the performance of bandwidth limited applications. Even though it provides high bandwidth, it must be augmented with DRAM to meet the memory capacity requirement of any applications. Due to this limitation, objects in an application should be optimally placed on the heterogeneous memory subsystems. In this study, we propose an object placement algorithm that places program objects to fast or slow memories in case the capacity of fast memory is insufficient to hold all the objects to increase the overall application performance. Our algorithm uses the reference counts and type of references (read or write) to make an initial placement of data. In addition, we perform various memory bandwidth benchmarks to be used in our placement algorithm on Intel Knights Landing (KNL) architecture. Not surprisingly high bandwidth memory sustains higher read bandwidth than write bandwidth, however, placing write-intensive data on HBM results in better overall performance because write-intensive data is punished by the DRAM speed more severely compared to read intensive data. Moreover, our benchmarks demonstrate that if a basic block makes references to both types of memories, it performs worse than if it makes references to only one type of memory in some cases. We test our proposed placement algorithm with 6 applications under various system configurations. By allocating objects according to our placement scheme, we are able to achieve a speedup of up to 2x.","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127046654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信