A Micro-benchmark Suite for AMD GPUs

Ryan Taylor, Xiaoming Li
{"title":"A Micro-benchmark Suite for AMD GPUs","authors":"Ryan Taylor, Xiaoming Li","doi":"10.1109/ICPPW.2010.59","DOIUrl":null,"url":null,"abstract":"Optimizing programs for Graphic Processing Unit (GPU) requires thorough knowledge about the values of architectural features for the new computing platform. However, this knowledge is frequently unavailable, e.g., due to insufficient documentation, which is probably a result of the infancy of general purpose computing on the GPU. What makes the modeling of program performance on GPU even more difficult is that the exact value of some “architectural” parameters on the GPU depends on how a GPU program interacts with those features. For example, AMD GPUs show different memory latencies when the memory is accessed with address sequences that have different patterns. Current micro-benchmark suites such as X-Ray are powerless for characterizing the GPU. Clearly, a preliminary for efficient code optimization and automatic tuning on the GPU is a systematic method to measure the architectural features and identify the most basic program characteristics that determine the performance of a program on the new GPU architectures. In this paper, we present a micro-benchmark suite for AMD GPUs that supports the AMD StreamSDK. Our model identifies and measures a series of architectural features and basic program characteristics that are most important and most predictive for program performance on the platform. The features and characteristics include vectorization, burst write latency, texture fetch latency, global read and write latency, ALU/Fetch operation ratio, domain size and register usage for both AMD’s pixel shader and compute shader modes. Our performance model not only generates correct values for those parameters, but also provides a clear picture of program performance on the GPU.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 39th International Conference on Parallel Processing Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPPW.2010.59","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30

Abstract

Optimizing programs for Graphic Processing Unit (GPU) requires thorough knowledge about the values of architectural features for the new computing platform. However, this knowledge is frequently unavailable, e.g., due to insufficient documentation, which is probably a result of the infancy of general purpose computing on the GPU. What makes the modeling of program performance on GPU even more difficult is that the exact value of some “architectural” parameters on the GPU depends on how a GPU program interacts with those features. For example, AMD GPUs show different memory latencies when the memory is accessed with address sequences that have different patterns. Current micro-benchmark suites such as X-Ray are powerless for characterizing the GPU. Clearly, a preliminary for efficient code optimization and automatic tuning on the GPU is a systematic method to measure the architectural features and identify the most basic program characteristics that determine the performance of a program on the new GPU architectures. In this paper, we present a micro-benchmark suite for AMD GPUs that supports the AMD StreamSDK. Our model identifies and measures a series of architectural features and basic program characteristics that are most important and most predictive for program performance on the platform. The features and characteristics include vectorization, burst write latency, texture fetch latency, global read and write latency, ALU/Fetch operation ratio, domain size and register usage for both AMD’s pixel shader and compute shader modes. Our performance model not only generates correct values for those parameters, but also provides a clear picture of program performance on the GPU.
AMD gpu的微基准测试套件
优化图形处理单元(GPU)的程序需要对新计算平台的架构特性的价值有全面的了解。然而,这些知识经常是不可用的,例如,由于文档不足,这可能是GPU上通用计算的初级阶段的结果。使GPU上的程序性能建模变得更加困难的是,GPU上一些“架构”参数的确切值取决于GPU程序如何与这些特征交互。例如,当使用具有不同模式的地址序列访问内存时,AMD gpu显示不同的内存延迟。目前的微基准套件,如x射线,是无力表征GPU。显然,在GPU上进行有效的代码优化和自动调优的初步方法是测量架构特征并确定决定程序在新GPU架构上性能的最基本程序特征的系统方法。在本文中,我们提出了一个支持AMD StreamSDK的AMD gpu微基准测试套件。我们的模型识别并测量了一系列体系结构特征和基本程序特征,这些特征对平台上的程序性能最重要、最具预测性。特性和特征包括矢量化,突发写入延迟,纹理获取延迟,全局读写延迟,ALU/ fetch操作比,域大小和寄存器使用AMD的像素着色器和计算着色器模式。我们的性能模型不仅为这些参数生成正确的值,而且还提供了GPU上程序性能的清晰图像。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信