POSTER: Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline Stalls

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) Pub Date : 2017-09-01 DOI:10.1109/PACT.2017.30

Hongwen Dai, Zhen Lin, C. Li, Chen Zhao, Fei Wang, Nanning Zheng, Huiyang Zhou

引用次数: 2

Abstract

In this study, we demonstrate that the performance may be undermined in the state-of-the-art intra-SM sharing schemes for concurrent kernel execution (CKE) on GPUs, due to the interference among concurrent kernels. We highlight that cache partitioning techniques proposed for CPUs are not effective for GPUs. Then we propose to balance memory accesses and limit the number of inflight memory instructions issued from concurrent kernels to reduce memory pipeline stalls. Our proposed schemes significantly improve the performance of two state-of-the-art intra-SM sharing schemes, Warped-Slicer and SMK.

查看原文本刊更多论文

海报:加速GPU并发内核执行通过减少内存管道摊位

在这项研究中，我们证明了在gpu上并发内核执行(CKE)的最先进的sm内部共享方案中，由于并发内核之间的干扰，性能可能会受到损害。我们强调，为cpu提出的缓存分区技术对gpu并不有效。然后，我们建议平衡内存访问并限制并发内核发出的飞行内存指令的数量，以减少内存管道的停滞。我们提出的方案显著提高了两种最先进的sm内部共享方案(warp - slicer和SMK)的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

自引率

0.00%

发文量