Parallel graph algorithms by blocks: from I/O to algorithms

Proceedings of the 18th ACM International Conference on Computing Frontiers Pub Date : 2021-05-11 DOI:10.1145/3457388.3459987

Abdurrahman Yasar, Kasimir Gabert, Ümit V. Çatalyürek

{"title":"Parallel graph algorithms by blocks: from I/O to algorithms","authors":"Abdurrahman Yasar, Kasimir Gabert, Ümit V. Çatalyürek","doi":"10.1145/3457388.3459987","DOIUrl":null,"url":null,"abstract":"In today's data-driven world and heterogeneous computing environments, processing large-scale graphs in an architecture agnostic manner has become more crucial than ever before. In terms of graph analytics frameworks, on the one side, there has been a significant interest in developing hand-optimized high-performance computing solutions. On the systems side, following the big data movement and to bring parallel computing to the masses, researchers have proposed several graph processing and management systems to handle large-scale graphs. Hand optimized HPC approaches require high expertise and are expensive to maintain and develop, and graph processing frameworks suffer from limited expressibility and performance. We propose Parallel Graph Algorithms by Blocks (PGAbB), a block-based graph algorithms framework for shared-memory, multi-core, multi-GPU machines. PGAbB offers a sweet spot between efficient parallelism and architecture agnostic algorithm design for a wide class of graph problems while performing close to hand-optimized HPC implementations. While our PGAbB framework, as well as many other recent HPC graph-analytics frameworks, are highly tuned and able to run complex graph analytics in fractions of seconds on billion-edge graphs, there remains a gap in their end-to-end use. Despite the significant improvements that modern hardware and operating systems have made towards input and output, reading the graph from file systems easily takes thousands of times longer than running the computational kernel itself. This slowdown causes both a disconnect for end users and a loss of productivity for researchers and developers. We close this gap by providing a simple to use, small, header-only, and dependency-free C++11 library, PIGO, that brings I/O improvements to graph and sparse matrix systems. Using PIGO, we improve the end-to-end performance for state-of-the-art systems significantly---in many cases by over 40X.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 18th ACM International Conference on Computing Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3457388.3459987","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In today's data-driven world and heterogeneous computing environments, processing large-scale graphs in an architecture agnostic manner has become more crucial than ever before. In terms of graph analytics frameworks, on the one side, there has been a significant interest in developing hand-optimized high-performance computing solutions. On the systems side, following the big data movement and to bring parallel computing to the masses, researchers have proposed several graph processing and management systems to handle large-scale graphs. Hand optimized HPC approaches require high expertise and are expensive to maintain and develop, and graph processing frameworks suffer from limited expressibility and performance. We propose Parallel Graph Algorithms by Blocks (PGAbB), a block-based graph algorithms framework for shared-memory, multi-core, multi-GPU machines. PGAbB offers a sweet spot between efficient parallelism and architecture agnostic algorithm design for a wide class of graph problems while performing close to hand-optimized HPC implementations. While our PGAbB framework, as well as many other recent HPC graph-analytics frameworks, are highly tuned and able to run complex graph analytics in fractions of seconds on billion-edge graphs, there remains a gap in their end-to-end use. Despite the significant improvements that modern hardware and operating systems have made towards input and output, reading the graph from file systems easily takes thousands of times longer than running the computational kernel itself. This slowdown causes both a disconnect for end users and a loss of productivity for researchers and developers. We close this gap by providing a simple to use, small, header-only, and dependency-free C++11 library, PIGO, that brings I/O improvements to graph and sparse matrix systems. Using PIGO, we improve the end-to-end performance for state-of-the-art systems significantly---in many cases by over 40X.

查看原文本刊更多论文

并行图算法块:从I/O到算法

在当今数据驱动的世界和异构计算环境中，以与体系结构无关的方式处理大规模图变得比以往任何时候都更加重要。在图形分析框架方面，一方面，人们对开发手动优化的高性能计算解决方案非常感兴趣。在系统方面，随着大数据的发展和并行计算的普及，研究人员提出了几种图形处理和管理系统来处理大规模的图形。手工优化的高性能计算方法需要很高的专业知识，维护和开发成本很高，图形处理框架的可表达性和性能有限。我们提出并行图算法块(PGAbB)，一个基于块的图算法框架，用于共享内存，多核，多gpu机器。PGAbB提供了一个介于高效并行性和架构无关的算法设计之间的最佳点，用于广泛的图形问题，同时执行接近手动优化的HPC实现。虽然我们的PGAbB框架，以及许多其他最近的HPC图形分析框架，都是高度调整的，能够在几秒钟内对十亿边的图形运行复杂的图形分析，但它们的端到端使用仍然存在差距。尽管现代硬件和操作系统在输入和输出方面有了很大的改进，但是从文件系统读取图形的时间比运行计算内核本身要长几千倍。这种减速既会导致最终用户的脱节，也会导致研究人员和开发人员的生产力下降。我们提供了一个易于使用的、小型的、仅限头文件的、无依赖的c++ 11库PIGO，从而缩小了这一差距。PIGO为图形和稀疏矩阵系统带来了I/O改进。使用PIGO，我们显著提高了最先进系统的端到端性能，在许多情况下提高了40倍以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 18th ACM International Conference on Computing Frontiers

自引率

0.00%

发文量