GPUMech: GPU Performance Modeling Technique Based on Interval Analysis

Jen-Cheng Huang, Joo Hwan Lee, Hyesoon Kim, H. Lee
{"title":"GPUMech: GPU Performance Modeling Technique Based on Interval Analysis","authors":"Jen-Cheng Huang, Joo Hwan Lee, Hyesoon Kim, H. Lee","doi":"10.1109/MICRO.2014.59","DOIUrl":null,"url":null,"abstract":"GPU has become a first-order computing plat-form. Nonetheless, not many performance modeling techniques have been developed for architecture studies. Several GPU analytical performance models have been proposed, but they mostly target application optimizations rather than the study of different architecture design options. Interval analysis is a relatively accurate performance modeling technique, which traverses the instruction trace and uses functional simulators, e.g., Cache simulator, to track the stall events that cause performance loss. It shows hundred times of speedup compared to detailed timing simulations and better accuracy compared to pure analytical models. However, previous techniques are limited to CPUs and not applicable to multithreaded architectures. In this work, we propose GPU Mech, an interval analysis-based performance modeling technique for GPU architectures. GPU Mech models multithreading and resource contentions caused by memory divergence. We compare GPU Mech with a detailed timing simulator and show that on average, GPU Mechhas 13.2% error for modeling the round-robin scheduling policy and 14.0% error for modeling the greedy-then-oldest policy while achieving a 97x faster simulation speed. In addition, GPU Mech generates CPI stacks, which help hardware/software developers to visualize performance bottlenecks of a kernel.","PeriodicalId":6591,"journal":{"name":"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture","volume":"9 1","pages":"268-279"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MICRO.2014.59","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 44

Abstract

GPU has become a first-order computing plat-form. Nonetheless, not many performance modeling techniques have been developed for architecture studies. Several GPU analytical performance models have been proposed, but they mostly target application optimizations rather than the study of different architecture design options. Interval analysis is a relatively accurate performance modeling technique, which traverses the instruction trace and uses functional simulators, e.g., Cache simulator, to track the stall events that cause performance loss. It shows hundred times of speedup compared to detailed timing simulations and better accuracy compared to pure analytical models. However, previous techniques are limited to CPUs and not applicable to multithreaded architectures. In this work, we propose GPU Mech, an interval analysis-based performance modeling technique for GPU architectures. GPU Mech models multithreading and resource contentions caused by memory divergence. We compare GPU Mech with a detailed timing simulator and show that on average, GPU Mechhas 13.2% error for modeling the round-robin scheduling policy and 14.0% error for modeling the greedy-then-oldest policy while achieving a 97x faster simulation speed. In addition, GPU Mech generates CPI stacks, which help hardware/software developers to visualize performance bottlenecks of a kernel.
GPUMech:基于区间分析的GPU性能建模技术
GPU已经成为一级计算平台。尽管如此,为架构研究开发的性能建模技术并不多。已经提出了几种GPU分析性能模型,但它们主要针对应用程序优化,而不是研究不同的架构设计选项。间隔分析是一种相对准确的性能建模技术,它遍历指令跟踪并使用功能模拟器(例如Cache模拟器)来跟踪导致性能损失的失速事件。与详细的时序模拟相比,它显示了数百倍的加速,与纯分析模型相比,它具有更好的准确性。但是,以前的技术仅限于cpu,不适用于多线程体系结构。在这项工作中,我们提出了GPU Mech,这是一种基于间隔分析的GPU架构性能建模技术。GPU力学模型多线程和资源争用引起的内存发散。我们将GPU Mech与详细的时序模拟器进行比较,结果表明,平均而言,GPU Mech在建模轮询调度策略时的误差为13.2%,在建模贪婪-最老策略时的误差为14.0%,而仿真速度提高了97倍。此外,GPU Mech生成CPI堆栈,这有助于硬件/软件开发人员可视化内核的性能瓶颈。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信