Increasing GPU throughput using kernel interleaved thread block scheduling

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI:10.1109/ICCD.2013.6657093

Mihir Awatramani, Joseph Zambreno, D. Rover

引用次数: 18

Abstract

The number of active threads required to achieve peak application throughput on graphics processing units (GPUs) depends largely on the ratio of time spent on computation to the time spent accessing data from memory. While compute-intensive applications can achieve peak throughput with a low number of threads, memory-intensive applications might not achieve good throughput even at the maximum supported thread count. In this paper, we study the effects of scheduling work from multiple applications on the same GPU core. We claim that interleaving workload from different applications on a GPU core can improve the utilization of computational units and reduce the load on memory subsystem. Experiments on 17 application pairs from the Rodinia benchmark suite show that overall throughput increases by 7% on average.

查看原文本刊更多论文

使用内核交错线程块调度增加GPU吞吐量

在图形处理单元(gpu)上实现峰值应用程序吞吐量所需的活动线程数在很大程度上取决于用于计算的时间与从内存访问数据的时间之比。虽然计算密集型应用程序可以使用少量线程实现峰值吞吐量，但内存密集型应用程序即使在支持的最大线程数下也可能无法实现良好的吞吐量。在本文中，我们研究了在同一个GPU核心上调度多个应用程序的工作的影响。我们声称在GPU核心上交叉处理来自不同应用程序的工作负载可以提高计算单元的利用率并减少内存子系统的负载。对来自Rodinia基准套件的17个应用程序对进行的实验表明，总体吞吐量平均提高了7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE 31st International Conference on Computer Design (ICCD)

自引率

0.00%

发文量