Automatic CPU-GPU Allocation for Graph Execution

2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) Pub Date : 2023-03-01 DOI:10.1109/PDP59025.2023.00013

Marcelo K. Moori, Hiago Mayk G. de A. Rocha, Matheus A. Silva, Janaina Schwarzrock, A. Lorenzon, A. C. S. Beck

{"title":"Automatic CPU-GPU Allocation for Graph Execution","authors":"Marcelo K. Moori, Hiago Mayk G. de A. Rocha, Matheus A. Silva, Janaina Schwarzrock, A. Lorenzon, A. C. S. Beck","doi":"10.1109/PDP59025.2023.00013","DOIUrl":null,"url":null,"abstract":"Although advances in modern GPUs have accelerated the execution of heavy data processing applications, speeding up graph processing on these systems is not a trivial task: graph applications are characterized by their high volume of irregular memory access that varies with the graph structure so that they do not reach their peak performance when executing on GPUs in many times. In these cases, the CPU execution is more suitable. Given that graph structures can be identified through high-level metrics (e.g., diameter and average clustering coefficient), they may assist the designer in deciding where to execute a given input graph (GPU or CPU). Based on that, in this work, we propose GraCo: a graph processing framework to help the decision-making on where to process a batch of graph applications. Whenever a new batch is submitted to the target HPC system, GraCo decides the best machine to execute each application based only on the available high-level features, precluding any additional applications' execution. Our experimental results comparing GraCo with three other strategies executed on an HPC system comprised of 4 CPUs and 3 GPUs showed that GraCo outperforms the other strategies by at least 34.94×, 13.59×, and 492.31× in total execution time, energy, and energy-delay product.","PeriodicalId":153500,"journal":{"name":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP59025.2023.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Although advances in modern GPUs have accelerated the execution of heavy data processing applications, speeding up graph processing on these systems is not a trivial task: graph applications are characterized by their high volume of irregular memory access that varies with the graph structure so that they do not reach their peak performance when executing on GPUs in many times. In these cases, the CPU execution is more suitable. Given that graph structures can be identified through high-level metrics (e.g., diameter and average clustering coefficient), they may assist the designer in deciding where to execute a given input graph (GPU or CPU). Based on that, in this work, we propose GraCo: a graph processing framework to help the decision-making on where to process a batch of graph applications. Whenever a new batch is submitted to the target HPC system, GraCo decides the best machine to execute each application based only on the available high-level features, precluding any additional applications' execution. Our experimental results comparing GraCo with three other strategies executed on an HPC system comprised of 4 CPUs and 3 GPUs showed that GraCo outperforms the other strategies by at least 34.94×, 13.59×, and 492.31× in total execution time, energy, and energy-delay product.

查看原文本刊更多论文

图形执行的自动CPU-GPU分配

尽管现代gpu的进步加速了大量数据处理应用程序的执行，但在这些系统上加速图形处理并不是一项简单的任务:图形应用程序的特点是大量不规则的内存访问，随着图形结构的变化而变化，因此在gpu上多次执行时无法达到其峰值性能。在这些情况下，CPU执行更合适。考虑到图形结构可以通过高级指标(例如，直径和平均聚类系数)来识别，它们可以帮助设计师决定在哪里执行给定的输入图形(GPU或CPU)。在此基础上，本文提出了图形处理框架GraCo，以帮助决策在何处处理一批图形应用程序。每当一个新的批处理提交到目标HPC系统时，GraCo仅根据可用的高级功能决定执行每个应用程序的最佳机器，排除任何其他应用程序的执行。我们的实验结果表明，在一个由4个cpu和3个gpu组成的HPC系统上，GraCo与其他三种策略相比，GraCo在总执行时间、能量和能量延迟乘积上至少优于其他策略34.94倍、13.59倍和492.31倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)

自引率

0.00%

发文量