利用动态并行性有效支持gpu上的不规则嵌套循环

Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores Pub Date : 2015-02-08 DOI:10.1145/2723772.2723780

Da Li, Hancheng Wu, M. Becchi

{"title":"利用动态并行性有效支持gpu上的不规则嵌套循环","authors":"Da Li, Hancheng Wu, M. Becchi","doi":"10.1145/2723772.2723780","DOIUrl":null,"url":null,"abstract":"Graphics Processing Units (GPUs) have been used in general purpose computing for several years. The newly introduced Dynamic Parallelism feature of Nvidia's Kepler GPUs allows launching kernels from the GPU directly. However, the naïve use of this feature can cause a high number of nested kernel launches, each performing limited work, leading to GPU underutilization and poor performance. We propose workload consolidation mechanisms at different granularities to maximize the work performed by nested kernels and reduce their overhead. Our end goal is to design automatic code transformation techniques for applications with irregular nested loops.","PeriodicalId":350480,"journal":{"name":"Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Exploiting Dynamic Parallelism to Efficiently Support Irregular Nested Loops on GPUs\",\"authors\":\"Da Li, Hancheng Wu, M. Becchi\",\"doi\":\"10.1145/2723772.2723780\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graphics Processing Units (GPUs) have been used in general purpose computing for several years. The newly introduced Dynamic Parallelism feature of Nvidia's Kepler GPUs allows launching kernels from the GPU directly. However, the naïve use of this feature can cause a high number of nested kernel launches, each performing limited work, leading to GPU underutilization and poor performance. We propose workload consolidation mechanisms at different granularities to maximize the work performed by nested kernels and reduce their overhead. Our end goal is to design automatic code transformation techniques for applications with irregular nested loops.\",\"PeriodicalId\":350480,\"journal\":{\"name\":\"Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-02-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2723772.2723780\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2723772.2723780","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

图形处理单元(gpu)已经在通用计算中使用了好几年。Nvidia的Kepler GPU新引入的动态并行特性允许直接从GPU启动内核。但是，naïve使用此特性可能导致大量嵌套内核启动，每个内核执行有限的工作，导致GPU利用率不足和性能低下。我们提出了不同粒度的工作负载整合机制，以最大化嵌套内核执行的工作并减少它们的开销。我们的最终目标是为具有不规则嵌套循环的应用程序设计自动代码转换技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploiting Dynamic Parallelism to Efficiently Support Irregular Nested Loops on GPUs

Graphics Processing Units (GPUs) have been used in general purpose computing for several years. The newly introduced Dynamic Parallelism feature of Nvidia's Kepler GPUs allows launching kernels from the GPU directly. However, the naïve use of this feature can cause a high number of nested kernel launches, each performing limited work, leading to GPU underutilization and poor performance. We propose workload consolidation mechanisms at different granularities to maximize the work performed by nested kernels and reduce their overhead. Our end goal is to design automatic code transformation techniques for applications with irregular nested loops.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores

自引率

0.00%

发文量