异构平台上的广度优先搜索:以社交网络为例

2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2016-10-01 DOI:10.1109/SBAC-PAD.2016.23

Luis Remis, M. Garzarán, R. Asenjo, A. Navarro

{"title":"异构平台上的广度优先搜索:以社交网络为例","authors":"Luis Remis, M. Garzarán, R. Asenjo, A. Navarro","doi":"10.1109/SBAC-PAD.2016.23","DOIUrl":null,"url":null,"abstract":"Breadth-First Search (BFS) is the core of many graph analysis algorithms and it is used in many problems, such as social network, computer network analysis, and data organization. BFS is an iterative algorithm that due to its irregular behavior is quite challenging to parallelize. Several approaches implement efficient algorithms for BFS for multicore architectures and for Graphics Processors, but it is still an open problem how to distribute the work among the main cores and the accelerators. In this paper, we assess several approaches to perform BFS on different heterogenous architectures (highend and embedded mobile processors composed of a multi-core CPU and an integrated GPU) with a focus on social network graphs. In particular, we propose two heterogenous approaches to exploit both devices. The first one, called Selective, selects on which device to execute each iteration. It is based on a previous approach, but we have adapted it to take advantage of the features of social network graphs (fewer iterations but more unbalanced). The second approach, referred as Concurrent, allows the execution of specific iterations concurrently in both devices. Our heterogenous implementations can be up to 1.56x faster and 1.32x more energy efficient with respect to the best of only-CPU or only-GPU baselines. We have also found that for a highly memory bound problem like BFS, the CPU-GPU collaborative execution is limited by the shared-memory bus bandwidth.","PeriodicalId":361160,"journal":{"name":"2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Breadth-First Search on Heterogeneous Platforms: A Case of Study on Social Networks\",\"authors\":\"Luis Remis, M. Garzarán, R. Asenjo, A. Navarro\",\"doi\":\"10.1109/SBAC-PAD.2016.23\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Breadth-First Search (BFS) is the core of many graph analysis algorithms and it is used in many problems, such as social network, computer network analysis, and data organization. BFS is an iterative algorithm that due to its irregular behavior is quite challenging to parallelize. Several approaches implement efficient algorithms for BFS for multicore architectures and for Graphics Processors, but it is still an open problem how to distribute the work among the main cores and the accelerators. In this paper, we assess several approaches to perform BFS on different heterogenous architectures (highend and embedded mobile processors composed of a multi-core CPU and an integrated GPU) with a focus on social network graphs. In particular, we propose two heterogenous approaches to exploit both devices. The first one, called Selective, selects on which device to execute each iteration. It is based on a previous approach, but we have adapted it to take advantage of the features of social network graphs (fewer iterations but more unbalanced). The second approach, referred as Concurrent, allows the execution of specific iterations concurrently in both devices. Our heterogenous implementations can be up to 1.56x faster and 1.32x more energy efficient with respect to the best of only-CPU or only-GPU baselines. We have also found that for a highly memory bound problem like BFS, the CPU-GPU collaborative execution is limited by the shared-memory bus bandwidth.\",\"PeriodicalId\":361160,\"journal\":{\"name\":\"2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SBAC-PAD.2016.23\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PAD.2016.23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

广度优先搜索(BFS)是许多图分析算法的核心，它被用于许多问题，如社会网络、计算机网络分析和数据组织。BFS是一种迭代算法，由于其不规则性，使得并行化具有很大的挑战性。对于多核架构和图形处理器的BFS，有几种方法实现了高效的算法，但如何在主核和加速器之间分配工作仍然是一个悬而未决的问题。在本文中，我们评估了几种在不同异构架构(由多核CPU和集成GPU组成的高端和嵌入式移动处理器)上执行BFS的方法，重点是社交网络图。特别是，我们提出了两种异构方法来利用这两种设备。第一个称为Selective，选择在哪个设备上执行每个迭代。它基于之前的方法，但我们对其进行了调整，以利用社交网络图的特性(迭代更少，但更不平衡)。第二种方法称为Concurrent，它允许在两个设备中并发地执行特定的迭代。相对于最好的cpu或gpu基准，我们的异构实现可以提高1.56倍的速度和1.32倍的能效。我们还发现，对于像BFS这样的内存高度受限的问题，CPU-GPU协同执行受到共享内存总线带宽的限制。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Breadth-First Search on Heterogeneous Platforms: A Case of Study on Social Networks

Breadth-First Search (BFS) is the core of many graph analysis algorithms and it is used in many problems, such as social network, computer network analysis, and data organization. BFS is an iterative algorithm that due to its irregular behavior is quite challenging to parallelize. Several approaches implement efficient algorithms for BFS for multicore architectures and for Graphics Processors, but it is still an open problem how to distribute the work among the main cores and the accelerators. In this paper, we assess several approaches to perform BFS on different heterogenous architectures (highend and embedded mobile processors composed of a multi-core CPU and an integrated GPU) with a focus on social network graphs. In particular, we propose two heterogenous approaches to exploit both devices. The first one, called Selective, selects on which device to execute each iteration. It is based on a previous approach, but we have adapted it to take advantage of the features of social network graphs (fewer iterations but more unbalanced). The second approach, referred as Concurrent, allows the execution of specific iterations concurrently in both devices. Our heterogenous implementations can be up to 1.56x faster and 1.32x more energy efficient with respect to the best of only-CPU or only-GPU baselines. We have also found that for a highly memory bound problem like BFS, the CPU-GPU collaborative execution is limited by the shared-memory bus bandwidth.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

自引率

0.00%

发文量