GPUMech: GPU Performance Modeling Technique Based on Interval Analysis

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture Pub Date : 2014-12-13 DOI:10.1109/MICRO.2014.59

Jen-Cheng Huang, Joo Hwan Lee, Hyesoon Kim, H. Lee

{"title":"GPUMech: GPU Performance Modeling Technique Based on Interval Analysis","authors":"Jen-Cheng Huang, Joo Hwan Lee, Hyesoon Kim, H. Lee","doi":"10.1109/MICRO.2014.59","DOIUrl":null,"url":null,"abstract":"GPU has become a first-order computing plat-form. Nonetheless, not many performance modeling techniques have been developed for architecture studies. Several GPU analytical performance models have been proposed, but they mostly target application optimizations rather than the study of different architecture design options. Interval analysis is a relatively accurate performance modeling technique, which traverses the instruction trace and uses functional simulators, e.g., Cache simulator, to track the stall events that cause performance loss. It shows hundred times of speedup compared to detailed timing simulations and better accuracy compared to pure analytical models. However, previous techniques are limited to CPUs and not applicable to multithreaded architectures. In this work, we propose GPU Mech, an interval analysis-based performance modeling technique for GPU architectures. GPU Mech models multithreading and resource contentions caused by memory divergence. We compare GPU Mech with a detailed timing simulator and show that on average, GPU Mechhas 13.2% error for modeling the round-robin scheduling policy and 14.0% error for modeling the greedy-then-oldest policy while achieving a 97x faster simulation speed. In addition, GPU Mech generates CPI stacks, which help hardware/software developers to visualize performance bottlenecks of a kernel.","PeriodicalId":6591,"journal":{"name":"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture","volume":"9 1","pages":"268-279"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MICRO.2014.59","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 44

Abstract

GPU has become a first-order computing plat-form. Nonetheless, not many performance modeling techniques have been developed for architecture studies. Several GPU analytical performance models have been proposed, but they mostly target application optimizations rather than the study of different architecture design options. Interval analysis is a relatively accurate performance modeling technique, which traverses the instruction trace and uses functional simulators, e.g., Cache simulator, to track the stall events that cause performance loss. It shows hundred times of speedup compared to detailed timing simulations and better accuracy compared to pure analytical models. However, previous techniques are limited to CPUs and not applicable to multithreaded architectures. In this work, we propose GPU Mech, an interval analysis-based performance modeling technique for GPU architectures. GPU Mech models multithreading and resource contentions caused by memory divergence. We compare GPU Mech with a detailed timing simulator and show that on average, GPU Mechhas 13.2% error for modeling the round-robin scheduling policy and 14.0% error for modeling the greedy-then-oldest policy while achieving a 97x faster simulation speed. In addition, GPU Mech generates CPI stacks, which help hardware/software developers to visualize performance bottlenecks of a kernel.

查看原文本刊更多论文

GPUMech:基于区间分析的GPU性能建模技术

GPU已经成为一级计算平台。尽管如此，为架构研究开发的性能建模技术并不多。已经提出了几种GPU分析性能模型，但它们主要针对应用程序优化，而不是研究不同的架构设计选项。间隔分析是一种相对准确的性能建模技术，它遍历指令跟踪并使用功能模拟器(例如Cache模拟器)来跟踪导致性能损失的失速事件。与详细的时序模拟相比，它显示了数百倍的加速，与纯分析模型相比，它具有更好的准确性。但是，以前的技术仅限于cpu，不适用于多线程体系结构。在这项工作中，我们提出了GPU Mech，这是一种基于间隔分析的GPU架构性能建模技术。GPU力学模型多线程和资源争用引起的内存发散。我们将GPU Mech与详细的时序模拟器进行比较，结果表明，平均而言，GPU Mech在建模轮询调度策略时的误差为13.2%，在建模贪婪-最老策略时的误差为14.0%，而仿真速度提高了97倍。此外，GPU Mech生成CPI堆栈，这有助于硬件/软件开发人员可视化内核的性能瓶颈。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture

自引率

0.00%

发文量