Accelerating Large-Scale Single-Source Shortest Path on FPGA

2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI:10.1109/IPDPSW.2015.130

Shijie Zhou, C. Chelmis, V. Prasanna

{"title":"Accelerating Large-Scale Single-Source Shortest Path on FPGA","authors":"Shijie Zhou, C. Chelmis, V. Prasanna","doi":"10.1109/IPDPSW.2015.130","DOIUrl":null,"url":null,"abstract":"Many real-world problems can be represented as graphs and solved by graph traversal algorithms. Single-Source Shortest Path (SSSP) is a fundamental graph algorithm. Today, large-scale graphs involve millions or even billions of vertices, making efficient parallel graph processing challenging. In this paper, we propose a single-FPGA based design to accelerate SSSP for massive graphs. We adopt the well-known Bellman-Ford algorithm. In the proposed design, graph is stored in external memory, which is more realistic for processing large scale graphs. Using the available external memory bandwidth, our design achieves the maximum data parallelism to concurrently process multiple edges in each clock cycle, regardless of data dependencies. The performance of our design is independent of the graph structure as well. We propose a optimized data layout to enable efficient utilization of external memory bandwidth. We prototype our design using a state-of-the-art FPGA. Experimental results show that our design is capable of processing 1.6 billion edges per second (GTEPS) using a single FPGA, while simultaneously achieving high clock rate of over 200 MHz. This would place us in the 131st position of the Graph 500 benchmark list of supercomputing systems for data intensive applications. Our solution therefore provides comparable performance to state-of-the-art systems.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"54 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2015.130","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 35

Abstract

Many real-world problems can be represented as graphs and solved by graph traversal algorithms. Single-Source Shortest Path (SSSP) is a fundamental graph algorithm. Today, large-scale graphs involve millions or even billions of vertices, making efficient parallel graph processing challenging. In this paper, we propose a single-FPGA based design to accelerate SSSP for massive graphs. We adopt the well-known Bellman-Ford algorithm. In the proposed design, graph is stored in external memory, which is more realistic for processing large scale graphs. Using the available external memory bandwidth, our design achieves the maximum data parallelism to concurrently process multiple edges in each clock cycle, regardless of data dependencies. The performance of our design is independent of the graph structure as well. We propose a optimized data layout to enable efficient utilization of external memory bandwidth. We prototype our design using a state-of-the-art FPGA. Experimental results show that our design is capable of processing 1.6 billion edges per second (GTEPS) using a single FPGA, while simultaneously achieving high clock rate of over 200 MHz. This would place us in the 131st position of the Graph 500 benchmark list of supercomputing systems for data intensive applications. Our solution therefore provides comparable performance to state-of-the-art systems.

查看原文本刊更多论文

FPGA上大规模单源最短路径加速

许多现实世界的问题都可以用图来表示，并通过图遍历算法来解决。单源最短路径(SSSP)是一种基本的图算法。如今，大规模图涉及数百万甚至数十亿个顶点，这使得高效的并行图处理具有挑战性。在本文中，我们提出了一种基于单fpga的设计来加速大规模图形的SSSP。我们采用了著名的Bellman-Ford算法。在本设计中，图形存储在外部存储器中，这对于处理大规模图形更为现实。利用可用的外部内存带宽，我们的设计实现了最大的数据并行性，在每个时钟周期内并发处理多个边缘，而不考虑数据依赖性。我们设计的性能与图的结构无关。我们提出一个优化的数据布局，使有效利用外部存储器带宽。我们使用最先进的FPGA设计原型。实验结果表明，我们的设计能够使用单个FPGA每秒处理16亿个边缘(GTEPS)，同时实现超过200 MHz的高时钟速率。这将使我们在数据密集型应用程序的Graph 500超级计算系统基准列表中排名第131位。因此，我们的解决方案提供了与最先进的系统相当的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE International Parallel and Distributed Processing Symposium Workshop

自引率

0.00%

发文量