利用规律加速gpu上的树遍历

Feng Zhang, Peng Di, Hao Zhou, Xiangke Liao, Jingling Xue
{"title":"利用规律加速gpu上的树遍历","authors":"Feng Zhang, Peng Di, Hao Zhou, Xiangke Liao, Jingling Xue","doi":"10.1109/ICPP.2016.71","DOIUrl":null,"url":null,"abstract":"Tree traversals are widely used irregular applications. Given a tree traversal algorithm, where a single tree is traversed by multiple queries (with truncation), its efficient parallelization on GPUs is hindered by branch divergence, load imbalance and memory-access irregularity, as the nodes and their visitation orders differ greatly under different queries. We leverage a key insight made on several truncation-induced tree traversal regularities to enable as many threads in the same warp as possible to visit the same node simultaneously, thereby enhancing both GPU resource utilization and memory coalescing at the same time. We introduce a new parallelization approach, RegTT, to orchestrate an efficient execution of a tree traversal algorithm on GPUs by starting with BFT (Breadth-First Traversal), then reordering the queries being processed (based on their truncation histories), and finally, switching to DFT (Depth-First Traversal). RegTT is general (without relying on domain-specific knowledge) and automatic (as a source-code transformation). For a set of five representative benchmarks used, RegTT outperforms the state-of-the-art by 1.66x on average.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"RegTT: Accelerating Tree Traversals on GPUs by Exploiting Regularities\",\"authors\":\"Feng Zhang, Peng Di, Hao Zhou, Xiangke Liao, Jingling Xue\",\"doi\":\"10.1109/ICPP.2016.71\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Tree traversals are widely used irregular applications. Given a tree traversal algorithm, where a single tree is traversed by multiple queries (with truncation), its efficient parallelization on GPUs is hindered by branch divergence, load imbalance and memory-access irregularity, as the nodes and their visitation orders differ greatly under different queries. We leverage a key insight made on several truncation-induced tree traversal regularities to enable as many threads in the same warp as possible to visit the same node simultaneously, thereby enhancing both GPU resource utilization and memory coalescing at the same time. We introduce a new parallelization approach, RegTT, to orchestrate an efficient execution of a tree traversal algorithm on GPUs by starting with BFT (Breadth-First Traversal), then reordering the queries being processed (based on their truncation histories), and finally, switching to DFT (Depth-First Traversal). RegTT is general (without relying on domain-specific knowledge) and automatic (as a source-code transformation). For a set of five representative benchmarks used, RegTT outperforms the state-of-the-art by 1.66x on average.\",\"PeriodicalId\":409991,\"journal\":{\"name\":\"2016 45th International Conference on Parallel Processing (ICPP)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 45th International Conference on Parallel Processing (ICPP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP.2016.71\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 45th International Conference on Parallel Processing (ICPP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2016.71","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

树遍历在不规则应用中被广泛使用。给定一种树遍历算法,其中单个树由多个查询遍历(带截断),由于不同查询下的节点及其访问顺序差异很大,分支发散、负载不平衡和内存访问不规则会阻碍其在gpu上的高效并行化。我们利用对几个截断引起的树遍历规律的关键洞察,使尽可能多的线程在相同的warp中同时访问相同的节点,从而同时提高GPU资源利用率和内存合并。我们引入了一种新的并行化方法,RegTT,通过从BFT(宽度优先遍历)开始,然后重新排序正在处理的查询(基于它们的截断历史),最后切换到DFT(深度优先遍历),来协调gpu上树遍历算法的有效执行。RegTT是通用的(不依赖于特定于领域的知识)和自动的(作为源代码转换)。对于所使用的五个代表性基准测试,RegTT的性能平均比最先进的测试高出1.66倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
RegTT: Accelerating Tree Traversals on GPUs by Exploiting Regularities
Tree traversals are widely used irregular applications. Given a tree traversal algorithm, where a single tree is traversed by multiple queries (with truncation), its efficient parallelization on GPUs is hindered by branch divergence, load imbalance and memory-access irregularity, as the nodes and their visitation orders differ greatly under different queries. We leverage a key insight made on several truncation-induced tree traversal regularities to enable as many threads in the same warp as possible to visit the same node simultaneously, thereby enhancing both GPU resource utilization and memory coalescing at the same time. We introduce a new parallelization approach, RegTT, to orchestrate an efficient execution of a tree traversal algorithm on GPUs by starting with BFT (Breadth-First Traversal), then reordering the queries being processed (based on their truncation histories), and finally, switching to DFT (Depth-First Traversal). RegTT is general (without relying on domain-specific knowledge) and automatic (as a source-code transformation). For a set of five representative benchmarks used, RegTT outperforms the state-of-the-art by 1.66x on average.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信