Automatic CPU-GPU Allocation for Graph Execution

Marcelo K. Moori, Hiago Mayk G. de A. Rocha, Matheus A. Silva, Janaina Schwarzrock, A. Lorenzon, A. C. S. Beck
{"title":"Automatic CPU-GPU Allocation for Graph Execution","authors":"Marcelo K. Moori, Hiago Mayk G. de A. Rocha, Matheus A. Silva, Janaina Schwarzrock, A. Lorenzon, A. C. S. Beck","doi":"10.1109/PDP59025.2023.00013","DOIUrl":null,"url":null,"abstract":"Although advances in modern GPUs have accelerated the execution of heavy data processing applications, speeding up graph processing on these systems is not a trivial task: graph applications are characterized by their high volume of irregular memory access that varies with the graph structure so that they do not reach their peak performance when executing on GPUs in many times. In these cases, the CPU execution is more suitable. Given that graph structures can be identified through high-level metrics (e.g., diameter and average clustering coefficient), they may assist the designer in deciding where to execute a given input graph (GPU or CPU). Based on that, in this work, we propose GraCo: a graph processing framework to help the decision-making on where to process a batch of graph applications. Whenever a new batch is submitted to the target HPC system, GraCo decides the best machine to execute each application based only on the available high-level features, precluding any additional applications' execution. Our experimental results comparing GraCo with three other strategies executed on an HPC system comprised of 4 CPUs and 3 GPUs showed that GraCo outperforms the other strategies by at least 34.94×, 13.59×, and 492.31× in total execution time, energy, and energy-delay product.","PeriodicalId":153500,"journal":{"name":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP59025.2023.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Although advances in modern GPUs have accelerated the execution of heavy data processing applications, speeding up graph processing on these systems is not a trivial task: graph applications are characterized by their high volume of irregular memory access that varies with the graph structure so that they do not reach their peak performance when executing on GPUs in many times. In these cases, the CPU execution is more suitable. Given that graph structures can be identified through high-level metrics (e.g., diameter and average clustering coefficient), they may assist the designer in deciding where to execute a given input graph (GPU or CPU). Based on that, in this work, we propose GraCo: a graph processing framework to help the decision-making on where to process a batch of graph applications. Whenever a new batch is submitted to the target HPC system, GraCo decides the best machine to execute each application based only on the available high-level features, precluding any additional applications' execution. Our experimental results comparing GraCo with three other strategies executed on an HPC system comprised of 4 CPUs and 3 GPUs showed that GraCo outperforms the other strategies by at least 34.94×, 13.59×, and 492.31× in total execution time, energy, and energy-delay product.
图形执行的自动CPU-GPU分配
尽管现代gpu的进步加速了大量数据处理应用程序的执行,但在这些系统上加速图形处理并不是一项简单的任务:图形应用程序的特点是大量不规则的内存访问,随着图形结构的变化而变化,因此在gpu上多次执行时无法达到其峰值性能。在这些情况下,CPU执行更合适。考虑到图形结构可以通过高级指标(例如,直径和平均聚类系数)来识别,它们可以帮助设计师决定在哪里执行给定的输入图形(GPU或CPU)。在此基础上,本文提出了图形处理框架GraCo,以帮助决策在何处处理一批图形应用程序。每当一个新的批处理提交到目标HPC系统时,GraCo仅根据可用的高级功能决定执行每个应用程序的最佳机器,排除任何其他应用程序的执行。我们的实验结果表明,在一个由4个cpu和3个gpu组成的HPC系统上,GraCo与其他三种策略相比,GraCo在总执行时间、能量和能量延迟乘积上至少优于其他策略34.94倍、13.59倍和492.31倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信