WER: Maximizing Parallelism of Irregular Graph Applications Through GPU Warp EqualizeR

En-Ming Huang, Bo Wun Cheng, Meng-Hsien Lin, Chun-Yi Lee, Tsung-Tai Yeh
{"title":"WER: Maximizing Parallelism of Irregular Graph Applications Through GPU Warp EqualizeR","authors":"En-Ming Huang, Bo Wun Cheng, Meng-Hsien Lin, Chun-Yi Lee, Tsung-Tai Yeh","doi":"10.1109/ASP-DAC58780.2024.10473955","DOIUrl":null,"url":null,"abstract":"Irregular graphs are becoming increasingly prevalent across a broad spectrum of data analysis applications. Despite their versatility, the inherent complexity and irregularity of these graphs often result in the underutilization of Single Instruction, Multiple Data (SIMD) resources when processed on Graphics Processing Units (GPUs). This underutilization originates from two primary issues: the occurrence of inactive threads and intra-warp load imbalances. These issues can produce idle threads, lead to inefficient usage of SIMD resources, consequently hamper throughput, and increase program execution time. To address these challenges, we introduce Warp EqualizeR (WER), a framework designed to optimize the utilization of SIMD resources on a GPU for processing irregular graphs. WER employs both software API and a specifically-tailored hardware microarchitecture. Such a synergistic approach enables workload redistribution in irregular graphs, which allows WER to enhance SIMD lane utilization and further harness the SIMD resources within a GPU. Our experimental results over seven different graph applications indicate that WER yields a geometric mean speedup of $2.52 \\times$ and $1.47 \\times$ over the baseline GPU and existing state-of-the-art methodologies, respectively.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"3 6","pages":"201-206"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASP-DAC58780.2024.10473955","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Irregular graphs are becoming increasingly prevalent across a broad spectrum of data analysis applications. Despite their versatility, the inherent complexity and irregularity of these graphs often result in the underutilization of Single Instruction, Multiple Data (SIMD) resources when processed on Graphics Processing Units (GPUs). This underutilization originates from two primary issues: the occurrence of inactive threads and intra-warp load imbalances. These issues can produce idle threads, lead to inefficient usage of SIMD resources, consequently hamper throughput, and increase program execution time. To address these challenges, we introduce Warp EqualizeR (WER), a framework designed to optimize the utilization of SIMD resources on a GPU for processing irregular graphs. WER employs both software API and a specifically-tailored hardware microarchitecture. Such a synergistic approach enables workload redistribution in irregular graphs, which allows WER to enhance SIMD lane utilization and further harness the SIMD resources within a GPU. Our experimental results over seven different graph applications indicate that WER yields a geometric mean speedup of $2.52 \times$ and $1.47 \times$ over the baseline GPU and existing state-of-the-art methodologies, respectively.
WER:通过 GPU Warp EqualizeR 最大化不规则图应用的并行性
不规则图形在各种数据分析应用中越来越普遍。尽管这些图形用途广泛,但其固有的复杂性和不规则性往往导致在图形处理器(GPU)上处理时,单指令多数据(SIMD)资源利用率不足。这种利用率不足主要源于两个问题:出现闲置线程和线程内负载不平衡。这些问题会产生闲置线程,导致 SIMD 资源使用效率低下,从而阻碍吞吐量并增加程序执行时间。为了应对这些挑战,我们引入了Warp EqualizeR(WER),这是一个旨在优化GPU上SIMD资源利用率的框架,用于处理不规则图形。WER 采用了软件 API 和专门定制的硬件微架构。这种协同方法能够在不规则图形中重新分配工作负载,从而使 WER 能够提高 SIMD 通道的利用率,并进一步利用 GPU 中的 SIMD 资源。我们对七种不同图形应用的实验结果表明,与基准 GPU 和现有的最先进方法相比,WER 的几何平均速度分别提高了 2.52 美元和 1.47 美元。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信