配置图形处理器的图形遍历应用:实现策略及其与图形特性的相关性分析

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI:10.1109/HPCS48598.2019.9188204

F. Busato, N. Bombieri

{"title":"配置图形处理器的图形遍历应用:实现策略及其与图形特性的相关性分析","authors":"F. Busato, N. Bombieri","doi":"10.1109/HPCS48598.2019.9188204","DOIUrl":null,"url":null,"abstract":"Implementing a graph traversal (GT) algorithm for GPUs is a very challenging task. It is a core primitive for many graph analysis applications and its efficiency strongly impacts on the overall application performance. Different strategies have been proposed to implement the GT algorithm by exploiting the GPU characteristics. Nevertheless, the efficiency of each of them strongly depends on the graph characteristics. This paper presents an analysis of the most important features of the parallel GT algorithm, which include frontier queue management, load balancing, duplicate removing, and synchronization during graph traversal iterations. It shows different techniques to implement each of such features for GPUs and the comparison of their performance when applied on a very large and heterogeneous set of graphs. The results allow identifying, for each feature and among different implementation techniques of them, the best configuration to address the graph characteristics. The paper finally presents how such a configuration analysis and set allow traversing graphs with throughput up to 14,000 MTEPS on single GPU devices, with speedups ranging from 1.2x to 18.5x with regard to the best parallel applications for GT on GPUs at the state of the art.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"73 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Configuring Graph Traversal Applications for GPUs: Analysis of Implementation Strategies and their Correlation with Graph Characteristics\",\"authors\":\"F. Busato, N. Bombieri\",\"doi\":\"10.1109/HPCS48598.2019.9188204\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Implementing a graph traversal (GT) algorithm for GPUs is a very challenging task. It is a core primitive for many graph analysis applications and its efficiency strongly impacts on the overall application performance. Different strategies have been proposed to implement the GT algorithm by exploiting the GPU characteristics. Nevertheless, the efficiency of each of them strongly depends on the graph characteristics. This paper presents an analysis of the most important features of the parallel GT algorithm, which include frontier queue management, load balancing, duplicate removing, and synchronization during graph traversal iterations. It shows different techniques to implement each of such features for GPUs and the comparison of their performance when applied on a very large and heterogeneous set of graphs. The results allow identifying, for each feature and among different implementation techniques of them, the best configuration to address the graph characteristics. The paper finally presents how such a configuration analysis and set allow traversing graphs with throughput up to 14,000 MTEPS on single GPU devices, with speedups ranging from 1.2x to 18.5x with regard to the best parallel applications for GT on GPUs at the state of the art.\",\"PeriodicalId\":371856,\"journal\":{\"name\":\"2019 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"73 2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCS48598.2019.9188204\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS48598.2019.9188204","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在gpu上实现图形遍历(GT)算法是一项非常具有挑战性的任务。它是许多图形分析应用程序的核心原语，它的效率对整个应用程序的性能有很大的影响。利用GPU的特性，提出了不同的策略来实现GT算法。然而，它们的效率很大程度上取决于图的特征。本文分析了并行GT算法的主要特征，包括前沿队列管理、负载平衡、重复删除和图遍历迭代中的同步。它展示了为gpu实现这些特性的不同技术，并在应用于非常大且异构的图形集时比较了它们的性能。结果允许识别每个特征以及它们的不同实现技术，以解决图形特征的最佳配置。本文最后介绍了这样的配置分析和设置如何允许在单个GPU设备上以高达14,000 MTEPS的吞吐量遍历图形，速度范围从1.2倍到18.5倍不等，考虑到目前最先进的GPU上的GT并行应用程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Configuring Graph Traversal Applications for GPUs: Analysis of Implementation Strategies and their Correlation with Graph Characteristics

Implementing a graph traversal (GT) algorithm for GPUs is a very challenging task. It is a core primitive for many graph analysis applications and its efficiency strongly impacts on the overall application performance. Different strategies have been proposed to implement the GT algorithm by exploiting the GPU characteristics. Nevertheless, the efficiency of each of them strongly depends on the graph characteristics. This paper presents an analysis of the most important features of the parallel GT algorithm, which include frontier queue management, load balancing, duplicate removing, and synchronization during graph traversal iterations. It shows different techniques to implement each of such features for GPUs and the comparison of their performance when applied on a very large and heterogeneous set of graphs. The results allow identifying, for each feature and among different implementation techniques of them, the best configuration to address the graph characteristics. The paper finally presents how such a configuration analysis and set allow traversing graphs with throughput up to 14,000 MTEPS on single GPU devices, with speedups ranging from 1.2x to 18.5x with regard to the best parallel applications for GT on GPUs at the state of the art.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 International Conference on High Performance Computing & Simulation (HPCS)

自引率

0.00%

发文量