利用动态并行性有效支持gpu上的不规则嵌套循环

Da Li, Hancheng Wu, M. Becchi
{"title":"利用动态并行性有效支持gpu上的不规则嵌套循环","authors":"Da Li, Hancheng Wu, M. Becchi","doi":"10.1145/2723772.2723780","DOIUrl":null,"url":null,"abstract":"Graphics Processing Units (GPUs) have been used in general purpose computing for several years. The newly introduced Dynamic Parallelism feature of Nvidia's Kepler GPUs allows launching kernels from the GPU directly. However, the naïve use of this feature can cause a high number of nested kernel launches, each performing limited work, leading to GPU underutilization and poor performance. We propose workload consolidation mechanisms at different granularities to maximize the work performed by nested kernels and reduce their overhead. Our end goal is to design automatic code transformation techniques for applications with irregular nested loops.","PeriodicalId":350480,"journal":{"name":"Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Exploiting Dynamic Parallelism to Efficiently Support Irregular Nested Loops on GPUs\",\"authors\":\"Da Li, Hancheng Wu, M. Becchi\",\"doi\":\"10.1145/2723772.2723780\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graphics Processing Units (GPUs) have been used in general purpose computing for several years. The newly introduced Dynamic Parallelism feature of Nvidia's Kepler GPUs allows launching kernels from the GPU directly. However, the naïve use of this feature can cause a high number of nested kernel launches, each performing limited work, leading to GPU underutilization and poor performance. We propose workload consolidation mechanisms at different granularities to maximize the work performed by nested kernels and reduce their overhead. Our end goal is to design automatic code transformation techniques for applications with irregular nested loops.\",\"PeriodicalId\":350480,\"journal\":{\"name\":\"Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-02-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2723772.2723780\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2723772.2723780","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

图形处理单元(gpu)已经在通用计算中使用了好几年。Nvidia的Kepler GPU新引入的动态并行特性允许直接从GPU启动内核。但是,naïve使用此特性可能导致大量嵌套内核启动,每个内核执行有限的工作,导致GPU利用率不足和性能低下。我们提出了不同粒度的工作负载整合机制,以最大化嵌套内核执行的工作并减少它们的开销。我们的最终目标是为具有不规则嵌套循环的应用程序设计自动代码转换技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Exploiting Dynamic Parallelism to Efficiently Support Irregular Nested Loops on GPUs
Graphics Processing Units (GPUs) have been used in general purpose computing for several years. The newly introduced Dynamic Parallelism feature of Nvidia's Kepler GPUs allows launching kernels from the GPU directly. However, the naïve use of this feature can cause a high number of nested kernel launches, each performing limited work, leading to GPU underutilization and poor performance. We propose workload consolidation mechanisms at different granularities to maximize the work performed by nested kernels and reduce their overhead. Our end goal is to design automatic code transformation techniques for applications with irregular nested loops.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信