线程相似矩阵:GPGPU程序中分支发散的可视化

Zhibin Yu, L. Eeckhout, Chengzhong Xu
{"title":"线程相似矩阵:GPGPU程序中分支发散的可视化","authors":"Zhibin Yu, L. Eeckhout, Chengzhong Xu","doi":"10.1109/ICPP.2016.27","DOIUrl":null,"url":null,"abstract":"Graphics processing units (GPUs) have recently evolved into popular accelerators for general-purpose parallel programs -- so-called GPGPU computing. Although programming models such as CUDA and OpenCL significantly improve GPGPU programmability, optimizing GPGPU programs is still far from trivial. Branch divergence is one of the root causes reducing GPGPU performance. Existing approaches are able to calculate the branch divergence rate but are unable to reveal how the branches diverge in a GPGPU program. In this paper, we propose the Thread Similarity Matrix (TSM) to visualize how branches diverge and in turn help find optimization opportunities. TSM contains an element for each pair of threads, representing the difference in code being executed by the pair of threads. The darker the element, the more similar the threads are, the lighter, the more dissimilar. TSM therefore allows GPGPU programmers to easily understand an application's branch divergence behavior and pinpoint performance anomalies. We present a case study to demonstrate how TSM can help optimize GPGPU programs: we improve the performance of a highly-optimized GPGPU kernel by 35% by reorganizing its thread organization to reduce its branch divergence rate.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Thread Similarity Matrix: Visualizing Branch Divergence in GPGPU Programs\",\"authors\":\"Zhibin Yu, L. Eeckhout, Chengzhong Xu\",\"doi\":\"10.1109/ICPP.2016.27\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graphics processing units (GPUs) have recently evolved into popular accelerators for general-purpose parallel programs -- so-called GPGPU computing. Although programming models such as CUDA and OpenCL significantly improve GPGPU programmability, optimizing GPGPU programs is still far from trivial. Branch divergence is one of the root causes reducing GPGPU performance. Existing approaches are able to calculate the branch divergence rate but are unable to reveal how the branches diverge in a GPGPU program. In this paper, we propose the Thread Similarity Matrix (TSM) to visualize how branches diverge and in turn help find optimization opportunities. TSM contains an element for each pair of threads, representing the difference in code being executed by the pair of threads. The darker the element, the more similar the threads are, the lighter, the more dissimilar. TSM therefore allows GPGPU programmers to easily understand an application's branch divergence behavior and pinpoint performance anomalies. We present a case study to demonstrate how TSM can help optimize GPGPU programs: we improve the performance of a highly-optimized GPGPU kernel by 35% by reorganizing its thread organization to reduce its branch divergence rate.\",\"PeriodicalId\":409991,\"journal\":{\"name\":\"2016 45th International Conference on Parallel Processing (ICPP)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 45th International Conference on Parallel Processing (ICPP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP.2016.27\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 45th International Conference on Parallel Processing (ICPP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2016.27","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

图形处理单元(gpu)最近已经发展成为通用并行程序的流行加速器——即所谓的GPGPU计算。尽管CUDA和OpenCL等编程模型显著提高了GPGPU的可编程性,但优化GPGPU程序仍然远非易事。分支发散是导致GPGPU性能下降的根本原因之一。现有的方法能够计算分支发散率,但无法揭示GPGPU程序中的分支如何发散。在本文中,我们提出了线程相似矩阵(TSM)来可视化分支如何发散,从而帮助找到优化机会。TSM为每对线程包含一个元素,表示这对线程执行的代码的差异。颜色越深,线越相似,颜色越浅,线越不相似。因此,TSM允许GPGPU程序员轻松地理解应用程序的分支偏离行为并查明性能异常。我们提供了一个案例研究来演示TSM如何帮助优化GPGPU程序:我们通过重新组织线程组织以降低分支发散率,将高度优化的GPGPU内核的性能提高了35%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Thread Similarity Matrix: Visualizing Branch Divergence in GPGPU Programs
Graphics processing units (GPUs) have recently evolved into popular accelerators for general-purpose parallel programs -- so-called GPGPU computing. Although programming models such as CUDA and OpenCL significantly improve GPGPU programmability, optimizing GPGPU programs is still far from trivial. Branch divergence is one of the root causes reducing GPGPU performance. Existing approaches are able to calculate the branch divergence rate but are unable to reveal how the branches diverge in a GPGPU program. In this paper, we propose the Thread Similarity Matrix (TSM) to visualize how branches diverge and in turn help find optimization opportunities. TSM contains an element for each pair of threads, representing the difference in code being executed by the pair of threads. The darker the element, the more similar the threads are, the lighter, the more dissimilar. TSM therefore allows GPGPU programmers to easily understand an application's branch divergence behavior and pinpoint performance anomalies. We present a case study to demonstrate how TSM can help optimize GPGPU programs: we improve the performance of a highly-optimized GPGPU kernel by 35% by reorganizing its thread organization to reduce its branch divergence rate.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信