gpu上的轻量级软件事务

Anup Holey, Antonia Zhai
{"title":"gpu上的轻量级软件事务","authors":"Anup Holey, Antonia Zhai","doi":"10.1109/ICPP.2014.55","DOIUrl":null,"url":null,"abstract":"Graphics Processing Units (GPUs) provide an attractive option for extracting data-level parallelism from diverse applications. However, some applications, although possess abundant data-level parallelism, exhibit irregular memory access patterns to the shared data structures. Porting such applications to GPUs requires synchronization mechanisms such as locks, which significantly increase the programming complexity. Coarse-grained locking, where a single lock controls all the shared resources, although reduces programming efforts, can substantially serialize GPU threads. On the other hand, fine-grained locking, where each data element is protected by an independent lock, although facilitates maximum parallelism, requires significant programming efforts. To overcome these challenges, we propose to support software transactional memory (STM) on GPU that is able to achieve performance comparable to fine-grained locking, while requiring minimal programming efforts. Software-based transactional execution can incur significant runtime overheads due to activities such as detecting conflicts across thousands of GPU threads and managing a consistent memory state. Thus, in this paper we illustrate three lightweight STM designs that are capable of scaling to a large number of GPU threads. In our system, programmers simply mark the critical sections in the applications, and the underlying STM support is able to achieve performance comparable to fine-grained locking.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Lightweight Software Transactions on GPUs\",\"authors\":\"Anup Holey, Antonia Zhai\",\"doi\":\"10.1109/ICPP.2014.55\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graphics Processing Units (GPUs) provide an attractive option for extracting data-level parallelism from diverse applications. However, some applications, although possess abundant data-level parallelism, exhibit irregular memory access patterns to the shared data structures. Porting such applications to GPUs requires synchronization mechanisms such as locks, which significantly increase the programming complexity. Coarse-grained locking, where a single lock controls all the shared resources, although reduces programming efforts, can substantially serialize GPU threads. On the other hand, fine-grained locking, where each data element is protected by an independent lock, although facilitates maximum parallelism, requires significant programming efforts. To overcome these challenges, we propose to support software transactional memory (STM) on GPU that is able to achieve performance comparable to fine-grained locking, while requiring minimal programming efforts. Software-based transactional execution can incur significant runtime overheads due to activities such as detecting conflicts across thousands of GPU threads and managing a consistent memory state. Thus, in this paper we illustrate three lightweight STM designs that are capable of scaling to a large number of GPU threads. In our system, programmers simply mark the critical sections in the applications, and the underlying STM support is able to achieve performance comparable to fine-grained locking.\",\"PeriodicalId\":441115,\"journal\":{\"name\":\"2014 43rd International Conference on Parallel Processing\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 43rd International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP.2014.55\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 43rd International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2014.55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29

摘要

图形处理单元(gpu)为从不同的应用程序中提取数据级并行性提供了一个有吸引力的选择。然而,有些应用程序虽然具有丰富的数据级并行性,但对共享数据结构表现出不规则的内存访问模式。将这样的应用程序移植到gpu上需要锁等同步机制,这大大增加了编程的复杂性。粗粒度锁,其中单个锁控制所有共享资源,虽然减少了编程工作,但可以大量序列化GPU线程。另一方面,细粒度锁(其中每个数据元素都由一个独立的锁保护)虽然促进了最大的并行性,但需要大量的编程工作。为了克服这些挑战,我们建议在GPU上支持软件事务性内存(STM),它能够实现与细粒度锁定相当的性能,同时需要最少的编程工作。基于软件的事务执行可能会由于检测跨数千个GPU线程的冲突和管理一致的内存状态等活动而导致显著的运行时开销。因此,在本文中,我们演示了三种能够扩展到大量GPU线程的轻量级STM设计。在我们的系统中,程序员只需在应用程序中标记关键区域,底层的STM支持能够实现与细粒度锁定相当的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Lightweight Software Transactions on GPUs
Graphics Processing Units (GPUs) provide an attractive option for extracting data-level parallelism from diverse applications. However, some applications, although possess abundant data-level parallelism, exhibit irregular memory access patterns to the shared data structures. Porting such applications to GPUs requires synchronization mechanisms such as locks, which significantly increase the programming complexity. Coarse-grained locking, where a single lock controls all the shared resources, although reduces programming efforts, can substantially serialize GPU threads. On the other hand, fine-grained locking, where each data element is protected by an independent lock, although facilitates maximum parallelism, requires significant programming efforts. To overcome these challenges, we propose to support software transactional memory (STM) on GPU that is able to achieve performance comparable to fine-grained locking, while requiring minimal programming efforts. Software-based transactional execution can incur significant runtime overheads due to activities such as detecting conflicts across thousands of GPU threads and managing a consistent memory state. Thus, in this paper we illustrate three lightweight STM designs that are capable of scaling to a large number of GPU threads. In our system, programmers simply mark the critical sections in the applications, and the underlying STM support is able to achieve performance comparable to fine-grained locking.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信