面向GPU加速的矢量化k核分解

32nd International Conference on Scientific and Statistical Database Management Pub Date : 2020-07-07 DOI:10.1145/3400903.3400931

Amir Mehrafsa, S. Chester, Alex Thomo

{"title":"面向GPU加速的矢量化k核分解","authors":"Amir Mehrafsa, S. Chester, Alex Thomo","doi":"10.1145/3400903.3400931","DOIUrl":null,"url":null,"abstract":"k-Core decomposition is a well-studied community detection problem in graph analytics in which each k-core of vertices induces a subgraph where all vertices have degree at least k. The decomposition is expensive to compute on large graphs and efforts to apply massive parallelism have had limited success. This paper presents a vectorisation of the problem that reframes it as a composition of vector primitives on flat, 1d arrays. With such a formulation, we can deploy highly optimised Deep Learning GPU and SIMD frameworks. On a moderate GPU, using PyTorch, we obtain up to 8 × improvement over the best parallel state-of-the-art implemented in C++ and running on an expensive 32-core machine. More importantly, our approach represents a novel abstraction showing that redesigning graph operations as a series of vectorised primitives makes highly-parallel analytics both easier and more accessible for developers. We posit that such an approach can vastly accelerate the use of cheap GPU hardware in complex graph analytics.","PeriodicalId":334018,"journal":{"name":"32nd International Conference on Scientific and Statistical Database Management","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Vectorising k-Core Decomposition for GPU Acceleration\",\"authors\":\"Amir Mehrafsa, S. Chester, Alex Thomo\",\"doi\":\"10.1145/3400903.3400931\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"k-Core decomposition is a well-studied community detection problem in graph analytics in which each k-core of vertices induces a subgraph where all vertices have degree at least k. The decomposition is expensive to compute on large graphs and efforts to apply massive parallelism have had limited success. This paper presents a vectorisation of the problem that reframes it as a composition of vector primitives on flat, 1d arrays. With such a formulation, we can deploy highly optimised Deep Learning GPU and SIMD frameworks. On a moderate GPU, using PyTorch, we obtain up to 8 × improvement over the best parallel state-of-the-art implemented in C++ and running on an expensive 32-core machine. More importantly, our approach represents a novel abstraction showing that redesigning graph operations as a series of vectorised primitives makes highly-parallel analytics both easier and more accessible for developers. We posit that such an approach can vastly accelerate the use of cheap GPU hardware in complex graph analytics.\",\"PeriodicalId\":334018,\"journal\":{\"name\":\"32nd International Conference on Scientific and Statistical Database Management\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"32nd International Conference on Scientific and Statistical Database Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3400903.3400931\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"32nd International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3400903.3400931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

k-core分解是图分析中一个研究得很好的群体检测问题，其中每个顶点的k-core都会产生一个所有顶点度至少为k的子图。在大型图上计算这种分解的成本很高，并且应用大规模并行性的努力取得了有限的成功。本文提出了一个矢量化的问题，将其重构为平面一维数组上的矢量原语的组合。有了这样的公式，我们可以部署高度优化的深度学习GPU和SIMD框架。在一个中等的GPU上，使用PyTorch，我们获得了在昂贵的32核机器上运行的c++中实现的最佳并行技术的8倍的改进。更重要的是，我们的方法代表了一种新颖的抽象，表明将图形操作重新设计为一系列向量化原语，使开发人员更容易获得高度并行的分析。我们认为这种方法可以极大地加快在复杂图形分析中使用廉价GPU硬件的速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Vectorising k-Core Decomposition for GPU Acceleration

k-Core decomposition is a well-studied community detection problem in graph analytics in which each k-core of vertices induces a subgraph where all vertices have degree at least k. The decomposition is expensive to compute on large graphs and efforts to apply massive parallelism have had limited success. This paper presents a vectorisation of the problem that reframes it as a composition of vector primitives on flat, 1d arrays. With such a formulation, we can deploy highly optimised Deep Learning GPU and SIMD frameworks. On a moderate GPU, using PyTorch, we obtain up to 8 × improvement over the best parallel state-of-the-art implemented in C++ and running on an expensive 32-core machine. More importantly, our approach represents a novel abstraction showing that redesigning graph operations as a series of vectorised primitives makes highly-parallel analytics both easier and more accessible for developers. We posit that such an approach can vastly accelerate the use of cheap GPU hardware in complex graph analytics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

32nd International Conference on Scientific and Statistical Database Management

自引率

0.00%

发文量