Marcelo K. Moori, Hiago Mayk G. de A. Rocha, Matheus A. Silva, Janaina Schwarzrock, A. Lorenzon, A. C. S. Beck
{"title":"Automatic CPU-GPU Allocation for Graph Execution","authors":"Marcelo K. Moori, Hiago Mayk G. de A. Rocha, Matheus A. Silva, Janaina Schwarzrock, A. Lorenzon, A. C. S. Beck","doi":"10.1109/PDP59025.2023.00013","DOIUrl":null,"url":null,"abstract":"Although advances in modern GPUs have accelerated the execution of heavy data processing applications, speeding up graph processing on these systems is not a trivial task: graph applications are characterized by their high volume of irregular memory access that varies with the graph structure so that they do not reach their peak performance when executing on GPUs in many times. In these cases, the CPU execution is more suitable. Given that graph structures can be identified through high-level metrics (e.g., diameter and average clustering coefficient), they may assist the designer in deciding where to execute a given input graph (GPU or CPU). Based on that, in this work, we propose GraCo: a graph processing framework to help the decision-making on where to process a batch of graph applications. Whenever a new batch is submitted to the target HPC system, GraCo decides the best machine to execute each application based only on the available high-level features, precluding any additional applications' execution. Our experimental results comparing GraCo with three other strategies executed on an HPC system comprised of 4 CPUs and 3 GPUs showed that GraCo outperforms the other strategies by at least 34.94×, 13.59×, and 492.31× in total execution time, energy, and energy-delay product.","PeriodicalId":153500,"journal":{"name":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP59025.2023.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Although advances in modern GPUs have accelerated the execution of heavy data processing applications, speeding up graph processing on these systems is not a trivial task: graph applications are characterized by their high volume of irregular memory access that varies with the graph structure so that they do not reach their peak performance when executing on GPUs in many times. In these cases, the CPU execution is more suitable. Given that graph structures can be identified through high-level metrics (e.g., diameter and average clustering coefficient), they may assist the designer in deciding where to execute a given input graph (GPU or CPU). Based on that, in this work, we propose GraCo: a graph processing framework to help the decision-making on where to process a batch of graph applications. Whenever a new batch is submitted to the target HPC system, GraCo decides the best machine to execute each application based only on the available high-level features, precluding any additional applications' execution. Our experimental results comparing GraCo with three other strategies executed on an HPC system comprised of 4 CPUs and 3 GPUs showed that GraCo outperforms the other strategies by at least 34.94×, 13.59×, and 492.31× in total execution time, energy, and energy-delay product.