Streaming graph challenge: Stochastic block partition

2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-08-25 DOI:10.1109/HPEC.2017.8091040

E. Kao, V. Gadepally, M. Hurley, Michael Jones, J. Kepner, S. Mohindra, P. Monticciolo, A. Reuther, S. Samsi, William S. Song, D. Staheli, S. Smith

{"title":"Streaming graph challenge: Stochastic block partition","authors":"E. Kao, V. Gadepally, M. Hurley, Michael Jones, J. Kepner, S. Mohindra, P. Monticciolo, A. Reuther, S. Samsi, William S. Song, D. Staheli, S. Smith","doi":"10.1109/HPEC.2017.8091040","DOIUrl":null,"url":null,"abstract":"An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive benchmarks and challenges have proven to be an effective means to advance state-of-the-art performance and foster community collaboration. This paper describes a graph partition challenge with a baseline partition algorithm of sub-quadratic complexity. The algorithm employs rigorous Bayesian inferential methods based on a statistical model that captures characteristics of the real-world graphs. This strong foundation enables the algorithm to address limitations of well-known graph partition approaches such as modularity maximization. This paper describes various aspects of the challenge including: (1) the data sets and streaming graph generator, (2) the baseline partition algorithm with pseudocode, (3) an argument for the correctness of parallelizing the Bayesian inference, (4) different parallel computation strategies such as node-based parallelism and matrix-based parallelism, (5) evaluation metrics for partition correctness and computational requirements, (6) preliminary timing of a Python-based demonstration code and the open source C++ code, and (7) considerations for partitioning the graph in streaming fashion. Data sets and source code for the algorithm as well as metrics, with detailed documentation are available at GraphChallenge.org.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"60","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2017.8091040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 60

Abstract

An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive benchmarks and challenges have proven to be an effective means to advance state-of-the-art performance and foster community collaboration. This paper describes a graph partition challenge with a baseline partition algorithm of sub-quadratic complexity. The algorithm employs rigorous Bayesian inferential methods based on a statistical model that captures characteristics of the real-world graphs. This strong foundation enables the algorithm to address limitations of well-known graph partition approaches such as modularity maximization. This paper describes various aspects of the challenge including: (1) the data sets and streaming graph generator, (2) the baseline partition algorithm with pseudocode, (3) an argument for the correctness of parallelizing the Bayesian inference, (4) different parallel computation strategies such as node-based parallelism and matrix-based parallelism, (5) evaluation metrics for partition correctness and computational requirements, (6) preliminary timing of a Python-based demonstration code and the open source C++ code, and (7) considerations for partitioning the graph in streaming fashion. Data sets and source code for the algorithm as well as metrics, with detailed documentation are available at GraphChallenge.org.

查看原文本刊更多论文

流图挑战:随机块划分

分析现实世界图的一个重要目标是在大型流图上实现可伸缩的性能。一个具有挑战性和相关性的例子是图划分问题。作为一个组合问题，图的划分是np困难的，但现有的松弛方法提供了合理的近似解，可以对大型图进行缩放。竞争性基准和挑战已被证明是提高最先进性能和促进社区合作的有效手段。本文描述了一种基于次二次复杂度的基线划分算法的图划分挑战。该算法采用严格的贝叶斯推理方法，该方法基于捕获现实世界图形特征的统计模型。这个强大的基础使该算法能够解决众所周知的图划分方法(如模块化最大化)的局限性。本文描述了挑战的各个方面，包括:(1)数据集和流图生成器，(2)带伪代码的基线分区算法，(3)贝叶斯推理并行化正确性的论证，(4)不同的并行计算策略，如基于节点的并行和基于矩阵的并行，(5)分区正确性和计算需求的评估指标，(6)基于python的演示代码和开源c++代码的初步时序，(7)以流方式划分图的注意事项。在GraphChallenge.org上可以找到算法的数据集和源代码以及详细的文档。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量