细粒度、不规则工作负载的动态并行性性能评估

Max Plauth, Frank Feinbube, F. Schlegel, A. Polze
{"title":"细粒度、不规则工作负载的动态并行性性能评估","authors":"Max Plauth, Frank Feinbube, F. Schlegel, A. Polze","doi":"10.15803/IJNC.6.2_212","DOIUrl":null,"url":null,"abstract":"GPU compute devices have become very popular for general purpose computations. However, the SIMD-like hardware of graphics processors is currently not well suited for irregular workloads, like searching unbalanced trees. In order to mitigate this drawback, NVIDIA introduced an extension to GPU programming models called Dynamic Parallelism. This extension enables GPU programs to spawn new units of work directly on the GPU, allowing the refinement of subsequent work items based on intermediate results without any involvement of the main CPU. This work investigates methods for employing Dynamic Parallelism with the goal of improved workload distribution for tree search algorithms on modern GPU hardware. For the evaluation of the proposed approaches, a case study is conducted on the N-Queens problem. Extensive benchmarks indicate that the benefits of improved resource utilization fail to outweigh high management overhead and runtime limitations due to the very fine level of granularity of the investigated problem. However, novel memory management concepts for passing parameters to child grids are presented. These general concepts are applicable to other, more coarse-grained problems that benefit from the use of Dynamic Parallelism.","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"A Performance Evaluation of Dynamic Parallelism for Fine-Grained, Irregular Workloads\",\"authors\":\"Max Plauth, Frank Feinbube, F. Schlegel, A. Polze\",\"doi\":\"10.15803/IJNC.6.2_212\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"GPU compute devices have become very popular for general purpose computations. However, the SIMD-like hardware of graphics processors is currently not well suited for irregular workloads, like searching unbalanced trees. In order to mitigate this drawback, NVIDIA introduced an extension to GPU programming models called Dynamic Parallelism. This extension enables GPU programs to spawn new units of work directly on the GPU, allowing the refinement of subsequent work items based on intermediate results without any involvement of the main CPU. This work investigates methods for employing Dynamic Parallelism with the goal of improved workload distribution for tree search algorithms on modern GPU hardware. For the evaluation of the proposed approaches, a case study is conducted on the N-Queens problem. Extensive benchmarks indicate that the benefits of improved resource utilization fail to outweigh high management overhead and runtime limitations due to the very fine level of granularity of the investigated problem. However, novel memory management concepts for passing parameters to child grids are presented. These general concepts are applicable to other, more coarse-grained problems that benefit from the use of Dynamic Parallelism.\",\"PeriodicalId\":270166,\"journal\":{\"name\":\"Int. J. Netw. Comput.\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Netw. Comput.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15803/IJNC.6.2_212\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Netw. Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15803/IJNC.6.2_212","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

摘要

GPU计算设备已经成为非常流行的通用计算。然而,图形处理器类似simd的硬件目前不太适合不规则的工作负载,比如搜索不平衡树。为了减轻这个缺点,NVIDIA为GPU编程模型引入了一个名为动态并行的扩展。此扩展使GPU程序能够直接在GPU上产生新的工作单元,允许根据中间结果对后续工作项进行细化,而无需主CPU的任何参与。这项工作研究了采用动态并行的方法,目的是改善树搜索算法在现代GPU硬件上的工作负载分布。为了评估所提出的方法,对N-Queens问题进行了案例研究。广泛的基准测试表明,由于所调查问题的粒度非常细,改进资源利用率的好处无法超过高管理开销和运行时限制。然而,提出了将参数传递给子网格的新的内存管理概念。这些一般概念适用于其他更粗粒度的问题,这些问题受益于动态并行的使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Performance Evaluation of Dynamic Parallelism for Fine-Grained, Irregular Workloads
GPU compute devices have become very popular for general purpose computations. However, the SIMD-like hardware of graphics processors is currently not well suited for irregular workloads, like searching unbalanced trees. In order to mitigate this drawback, NVIDIA introduced an extension to GPU programming models called Dynamic Parallelism. This extension enables GPU programs to spawn new units of work directly on the GPU, allowing the refinement of subsequent work items based on intermediate results without any involvement of the main CPU. This work investigates methods for employing Dynamic Parallelism with the goal of improved workload distribution for tree search algorithms on modern GPU hardware. For the evaluation of the proposed approaches, a case study is conducted on the N-Queens problem. Extensive benchmarks indicate that the benefits of improved resource utilization fail to outweigh high management overhead and runtime limitations due to the very fine level of granularity of the investigated problem. However, novel memory management concepts for passing parameters to child grids are presented. These general concepts are applicable to other, more coarse-grained problems that benefit from the use of Dynamic Parallelism.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信