Atos: A Task-Parallel GPU Scheduler for Graph Analytics

Yuxin Chen, Benjamin Brock, Serban D. Porumbescu, A. Buluç, K. Yelick, J. Owens
{"title":"Atos: A Task-Parallel GPU Scheduler for Graph Analytics","authors":"Yuxin Chen, Benjamin Brock, Serban D. Porumbescu, A. Buluç, K. Yelick, J. Owens","doi":"10.1145/3545008.3545056","DOIUrl":null,"url":null,"abstract":"We present Atos, a task-parallel GPU dynamic scheduling framework that is especially targeted at dynamic irregular applications. Compared to the dominant Bulk Synchronous Parallel (BSP) frameworks, Atos exposes additional concurrency by supporting task-parallel formulations of applications with relaxed dependencies, achieving higher GPU utilization, which is particularly significant for problems with concurrency bottlenecks. Atos also offers implicit task-parallel load balancing in addition to data-parallel load balancing, providing users the flexibility to balance between them to achieve optimal performance. Finally, Atos allows users to adapt to different use cases by controlling the kernel strategy and task-parallel granularity. We demonstrate that each of these controls is important in practice. We evaluate and analyze the performance of Atos vs. BSP on three applications: breadth-first search, PageRank, and graph coloring. Atos implementations achieve geomean speedups of 3.44x, 2.1x, and 2.77x and peak speedups of 12.8x, 3.2x, and 9.08x across three case studies, compared to a state-of-the-art BSP GPU implementation. Beyond simply quantifying the speedup, we extensively analyze the reasons behind each speedup. This deeper understanding allows us to derive general guidelines for how to select the optimal Atos configuration for different applications. Finally, our analysis provides insights for future dynamic scheduling framework designs.","PeriodicalId":360504,"journal":{"name":"Proceedings of the 51st International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3545008.3545056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

We present Atos, a task-parallel GPU dynamic scheduling framework that is especially targeted at dynamic irregular applications. Compared to the dominant Bulk Synchronous Parallel (BSP) frameworks, Atos exposes additional concurrency by supporting task-parallel formulations of applications with relaxed dependencies, achieving higher GPU utilization, which is particularly significant for problems with concurrency bottlenecks. Atos also offers implicit task-parallel load balancing in addition to data-parallel load balancing, providing users the flexibility to balance between them to achieve optimal performance. Finally, Atos allows users to adapt to different use cases by controlling the kernel strategy and task-parallel granularity. We demonstrate that each of these controls is important in practice. We evaluate and analyze the performance of Atos vs. BSP on three applications: breadth-first search, PageRank, and graph coloring. Atos implementations achieve geomean speedups of 3.44x, 2.1x, and 2.77x and peak speedups of 12.8x, 3.2x, and 9.08x across three case studies, compared to a state-of-the-art BSP GPU implementation. Beyond simply quantifying the speedup, we extensively analyze the reasons behind each speedup. This deeper understanding allows us to derive general guidelines for how to select the optimal Atos configuration for different applications. Finally, our analysis provides insights for future dynamic scheduling framework designs.
Atos:用于图形分析的任务并行GPU调度程序
我们提出了Atos,一个任务并行GPU动态调度框架,特别针对动态不规则应用程序。与主流的批量同步并行(BSP)框架相比,Atos通过支持具有宽松依赖关系的应用程序的任务并行公式暴露了额外的并发性,实现了更高的GPU利用率,这对于并发瓶颈问题尤为重要。除了数据并行负载平衡之外,Atos还提供了隐式任务并行负载平衡,为用户提供了在它们之间进行平衡以实现最佳性能的灵活性。最后,Atos允许用户通过控制内核策略和任务并行粒度来适应不同的用例。我们将演示这些控件中的每一个在实践中都是重要的。我们评估和分析了Atos和BSP在三个应用上的性能:广度优先搜索、PageRank和图形着色。与最先进的BSP GPU实现相比,在三个案例研究中,Atos实现的几何加速分别为3.44倍、2.1倍和2.77倍,峰值加速分别为12.8倍、3.2倍和9.08倍。除了简单地量化加速之外,我们还广泛地分析了每次加速背后的原因。这种更深入的理解使我们能够得出如何为不同的应用程序选择最佳的Atos配置的一般指导方针。最后,我们的分析为未来的动态调度框架设计提供了见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信