gpu上的轻量级软件事务

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI:10.1109/ICPP.2014.55

Anup Holey, Antonia Zhai

{"title":"gpu上的轻量级软件事务","authors":"Anup Holey, Antonia Zhai","doi":"10.1109/ICPP.2014.55","DOIUrl":null,"url":null,"abstract":"Graphics Processing Units (GPUs) provide an attractive option for extracting data-level parallelism from diverse applications. However, some applications, although possess abundant data-level parallelism, exhibit irregular memory access patterns to the shared data structures. Porting such applications to GPUs requires synchronization mechanisms such as locks, which significantly increase the programming complexity. Coarse-grained locking, where a single lock controls all the shared resources, although reduces programming efforts, can substantially serialize GPU threads. On the other hand, fine-grained locking, where each data element is protected by an independent lock, although facilitates maximum parallelism, requires significant programming efforts. To overcome these challenges, we propose to support software transactional memory (STM) on GPU that is able to achieve performance comparable to fine-grained locking, while requiring minimal programming efforts. Software-based transactional execution can incur significant runtime overheads due to activities such as detecting conflicts across thousands of GPU threads and managing a consistent memory state. Thus, in this paper we illustrate three lightweight STM designs that are capable of scaling to a large number of GPU threads. In our system, programmers simply mark the critical sections in the applications, and the underlying STM support is able to achieve performance comparable to fine-grained locking.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Lightweight Software Transactions on GPUs\",\"authors\":\"Anup Holey, Antonia Zhai\",\"doi\":\"10.1109/ICPP.2014.55\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graphics Processing Units (GPUs) provide an attractive option for extracting data-level parallelism from diverse applications. However, some applications, although possess abundant data-level parallelism, exhibit irregular memory access patterns to the shared data structures. Porting such applications to GPUs requires synchronization mechanisms such as locks, which significantly increase the programming complexity. Coarse-grained locking, where a single lock controls all the shared resources, although reduces programming efforts, can substantially serialize GPU threads. On the other hand, fine-grained locking, where each data element is protected by an independent lock, although facilitates maximum parallelism, requires significant programming efforts. To overcome these challenges, we propose to support software transactional memory (STM) on GPU that is able to achieve performance comparable to fine-grained locking, while requiring minimal programming efforts. Software-based transactional execution can incur significant runtime overheads due to activities such as detecting conflicts across thousands of GPU threads and managing a consistent memory state. Thus, in this paper we illustrate three lightweight STM designs that are capable of scaling to a large number of GPU threads. In our system, programmers simply mark the critical sections in the applications, and the underlying STM support is able to achieve performance comparable to fine-grained locking.\",\"PeriodicalId\":441115,\"journal\":{\"name\":\"2014 43rd International Conference on Parallel Processing\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 43rd International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP.2014.55\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 43rd International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2014.55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 29

摘要

图形处理单元(gpu)为从不同的应用程序中提取数据级并行性提供了一个有吸引力的选择。然而，有些应用程序虽然具有丰富的数据级并行性，但对共享数据结构表现出不规则的内存访问模式。将这样的应用程序移植到gpu上需要锁等同步机制，这大大增加了编程的复杂性。粗粒度锁，其中单个锁控制所有共享资源，虽然减少了编程工作，但可以大量序列化GPU线程。另一方面，细粒度锁(其中每个数据元素都由一个独立的锁保护)虽然促进了最大的并行性，但需要大量的编程工作。为了克服这些挑战，我们建议在GPU上支持软件事务性内存(STM)，它能够实现与细粒度锁定相当的性能，同时需要最少的编程工作。基于软件的事务执行可能会由于检测跨数千个GPU线程的冲突和管理一致的内存状态等活动而导致显著的运行时开销。因此，在本文中，我们演示了三种能够扩展到大量GPU线程的轻量级STM设计。在我们的系统中，程序员只需在应用程序中标记关键区域，底层的STM支持能够实现与细粒度锁定相当的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Lightweight Software Transactions on GPUs

Graphics Processing Units (GPUs) provide an attractive option for extracting data-level parallelism from diverse applications. However, some applications, although possess abundant data-level parallelism, exhibit irregular memory access patterns to the shared data structures. Porting such applications to GPUs requires synchronization mechanisms such as locks, which significantly increase the programming complexity. Coarse-grained locking, where a single lock controls all the shared resources, although reduces programming efforts, can substantially serialize GPU threads. On the other hand, fine-grained locking, where each data element is protected by an independent lock, although facilitates maximum parallelism, requires significant programming efforts. To overcome these challenges, we propose to support software transactional memory (STM) on GPU that is able to achieve performance comparable to fine-grained locking, while requiring minimal programming efforts. Software-based transactional execution can incur significant runtime overheads due to activities such as detecting conflicts across thousands of GPU threads and managing a consistent memory state. Thus, in this paper we illustrate three lightweight STM designs that are capable of scaling to a large number of GPU threads. In our system, programmers simply mark the critical sections in the applications, and the underlying STM support is able to achieve performance comparable to fine-grained locking.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 43rd International Conference on Parallel Processing

自引率

0.00%

发文量