TBPoint: Reducing Simulation Time for Large-Scale GPGPU Kernels

2014 IEEE 28th International Parallel and Distributed Processing Symposium Pub Date : 2014-05-19 DOI:10.1109/IPDPS.2014.53

Jen-Cheng Huang, Lifeng Nai, Hyesoon Kim, H. Lee

引用次数: 17

Abstract

Architecture simulation for GPGPU kernels can take a significant amount of time, especially for large-scale GPGPU kernels. This paper presents TBPoint, an infrastructure based on profiling-based sampling for GPGPU kernels to reduce the cycle-level simulation time. Compared to existing approaches, TBPoint provides a flexible and architecture-independent way to take samples. For the evaluated 12 kernels, the geometric means of sampling errors of TBPoint, Ideal-Simpoint, and random sampling are 0.47%, 1.74%, and 7.95%, respectively, while the geometric means of the total sample size of TBPoint, Ideal-Simpoint, and random sampling are 2.6%, 5.4%, and 10%, respectively. TBPoint narrows the speed gap between hardware and GPGPU simulators, enabling more and more large-scale GPGPU kernels to be analyzed using detailed timing simulations.

查看原文本刊更多论文

减少大规模GPGPU内核的仿真时间

GPGPU内核的体系结构模拟可能会花费大量的时间，特别是对于大规模的GPGPU内核。TBPoint是一种基于性能分析采样的GPGPU内核架构，可减少周期级仿真时间。与现有的方法相比，TBPoint提供了一种灵活的、与体系结构无关的采样方法。对于评估的12个核，TBPoint、Ideal-Simpoint和随机抽样的抽样误差几何均值分别为0.47%、1.74%和7.95%，TBPoint、Ideal-Simpoint和随机抽样的总样本量几何均值分别为2.6%、5.4%和10%。TBPoint缩小了硬件和GPGPU模拟器之间的速度差距，使越来越多的大规模GPGPU内核能够使用详细的时序模拟进行分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE 28th International Parallel and Distributed Processing Symposium

自引率

0.00%

发文量