DrGPU:一个自顶向下的GPU应用分析器

Yueming Hao, Nikhil Jain, R. Van der Wijngaart, N. Saxena, Yuanbo Fan, Xu Liu
{"title":"DrGPU:一个自顶向下的GPU应用分析器","authors":"Yueming Hao, Nikhil Jain, R. Van der Wijngaart, N. Saxena, Yuanbo Fan, Xu Liu","doi":"10.1145/3578244.3583736","DOIUrl":null,"url":null,"abstract":"GPUs have become common in HPC systems to accelerate scientific computing and machine learning applications. Efficiently mapping these applications to rapid evolutions of GPU architectures for high performance is a well-known challenge. Various performance inefficiencies exist in GPU kernels that impede applications from obtaining bare-metal performance. While existing tools are able to measure these inefficiencies, they mostly focus on data collection and presentation, requiring significant manual efforts to understand the root causes for actionable optimization. Thus, we develop DrGPU, a novel profiler that performs top-down analysis to guide GPU code optimization. As its salient feature, DrGPU leverages hardware performance counters available in commodity GPUs to quantify stall cycles, decompose them into various stall reasons, pinpoint root causes, and provide intuitive optimization guidance. With the help of DrGPU, we are able to analyze important GPU benchmarks and applications and obtain nontrivial speedups --- up to 1.77X on V100 and 2.03X on GTX 1650.","PeriodicalId":160204,"journal":{"name":"Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"DrGPU: A Top-Down Profiler for GPU Applications\",\"authors\":\"Yueming Hao, Nikhil Jain, R. Van der Wijngaart, N. Saxena, Yuanbo Fan, Xu Liu\",\"doi\":\"10.1145/3578244.3583736\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"GPUs have become common in HPC systems to accelerate scientific computing and machine learning applications. Efficiently mapping these applications to rapid evolutions of GPU architectures for high performance is a well-known challenge. Various performance inefficiencies exist in GPU kernels that impede applications from obtaining bare-metal performance. While existing tools are able to measure these inefficiencies, they mostly focus on data collection and presentation, requiring significant manual efforts to understand the root causes for actionable optimization. Thus, we develop DrGPU, a novel profiler that performs top-down analysis to guide GPU code optimization. As its salient feature, DrGPU leverages hardware performance counters available in commodity GPUs to quantify stall cycles, decompose them into various stall reasons, pinpoint root causes, and provide intuitive optimization guidance. With the help of DrGPU, we are able to analyze important GPU benchmarks and applications and obtain nontrivial speedups --- up to 1.77X on V100 and 2.03X on GTX 1650.\",\"PeriodicalId\":160204,\"journal\":{\"name\":\"Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering\",\"volume\":\"74 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3578244.3583736\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3578244.3583736","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

gpu已经在高性能计算系统中变得普遍,以加速科学计算和机器学习应用。有效地将这些应用程序映射到GPU架构的快速发展以获得高性能是一个众所周知的挑战。GPU内核中存在各种各样的性能低下,阻碍了应用程序获得裸机性能。虽然现有的工具能够度量这些低效率,但它们主要关注于数据收集和表示,需要大量的人工工作来理解可操作优化的根本原因。因此,我们开发了DrGPU,这是一种新颖的分析器,可以执行自顶向下的分析来指导GPU代码优化。DrGPU的显著特点是利用商品gpu的硬件性能计数器,量化失速周期,将其分解为各种失速原因,找出根本原因,并提供直观的优化指导。在DrGPU的帮助下,我们能够分析重要的GPU基准测试和应用程序,并获得非凡的加速-在V100上高达1.77X,在GTX 1650上高达2.03X。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DrGPU: A Top-Down Profiler for GPU Applications
GPUs have become common in HPC systems to accelerate scientific computing and machine learning applications. Efficiently mapping these applications to rapid evolutions of GPU architectures for high performance is a well-known challenge. Various performance inefficiencies exist in GPU kernels that impede applications from obtaining bare-metal performance. While existing tools are able to measure these inefficiencies, they mostly focus on data collection and presentation, requiring significant manual efforts to understand the root causes for actionable optimization. Thus, we develop DrGPU, a novel profiler that performs top-down analysis to guide GPU code optimization. As its salient feature, DrGPU leverages hardware performance counters available in commodity GPUs to quantify stall cycles, decompose them into various stall reasons, pinpoint root causes, and provide intuitive optimization guidance. With the help of DrGPU, we are able to analyze important GPU benchmarks and applications and obtain nontrivial speedups --- up to 1.77X on V100 and 2.03X on GTX 1650.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信