面向GPU加速的矢量化k核分解

Amir Mehrafsa, S. Chester, Alex Thomo
{"title":"面向GPU加速的矢量化k核分解","authors":"Amir Mehrafsa, S. Chester, Alex Thomo","doi":"10.1145/3400903.3400931","DOIUrl":null,"url":null,"abstract":"k-Core decomposition is a well-studied community detection problem in graph analytics in which each k-core of vertices induces a subgraph where all vertices have degree at least k. The decomposition is expensive to compute on large graphs and efforts to apply massive parallelism have had limited success. This paper presents a vectorisation of the problem that reframes it as a composition of vector primitives on flat, 1d arrays. With such a formulation, we can deploy highly optimised Deep Learning GPU and SIMD frameworks. On a moderate GPU, using PyTorch, we obtain up to 8 × improvement over the best parallel state-of-the-art implemented in C++ and running on an expensive 32-core machine. More importantly, our approach represents a novel abstraction showing that redesigning graph operations as a series of vectorised primitives makes highly-parallel analytics both easier and more accessible for developers. We posit that such an approach can vastly accelerate the use of cheap GPU hardware in complex graph analytics.","PeriodicalId":334018,"journal":{"name":"32nd International Conference on Scientific and Statistical Database Management","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Vectorising k-Core Decomposition for GPU Acceleration\",\"authors\":\"Amir Mehrafsa, S. Chester, Alex Thomo\",\"doi\":\"10.1145/3400903.3400931\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"k-Core decomposition is a well-studied community detection problem in graph analytics in which each k-core of vertices induces a subgraph where all vertices have degree at least k. The decomposition is expensive to compute on large graphs and efforts to apply massive parallelism have had limited success. This paper presents a vectorisation of the problem that reframes it as a composition of vector primitives on flat, 1d arrays. With such a formulation, we can deploy highly optimised Deep Learning GPU and SIMD frameworks. On a moderate GPU, using PyTorch, we obtain up to 8 × improvement over the best parallel state-of-the-art implemented in C++ and running on an expensive 32-core machine. More importantly, our approach represents a novel abstraction showing that redesigning graph operations as a series of vectorised primitives makes highly-parallel analytics both easier and more accessible for developers. We posit that such an approach can vastly accelerate the use of cheap GPU hardware in complex graph analytics.\",\"PeriodicalId\":334018,\"journal\":{\"name\":\"32nd International Conference on Scientific and Statistical Database Management\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"32nd International Conference on Scientific and Statistical Database Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3400903.3400931\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"32nd International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3400903.3400931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

k-core分解是图分析中一个研究得很好的群体检测问题,其中每个顶点的k-core都会产生一个所有顶点度至少为k的子图。在大型图上计算这种分解的成本很高,并且应用大规模并行性的努力取得了有限的成功。本文提出了一个矢量化的问题,将其重构为平面一维数组上的矢量原语的组合。有了这样的公式,我们可以部署高度优化的深度学习GPU和SIMD框架。在一个中等的GPU上,使用PyTorch,我们获得了在昂贵的32核机器上运行的c++中实现的最佳并行技术的8倍的改进。更重要的是,我们的方法代表了一种新颖的抽象,表明将图形操作重新设计为一系列向量化原语,使开发人员更容易获得高度并行的分析。我们认为这种方法可以极大地加快在复杂图形分析中使用廉价GPU硬件的速度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Vectorising k-Core Decomposition for GPU Acceleration
k-Core decomposition is a well-studied community detection problem in graph analytics in which each k-core of vertices induces a subgraph where all vertices have degree at least k. The decomposition is expensive to compute on large graphs and efforts to apply massive parallelism have had limited success. This paper presents a vectorisation of the problem that reframes it as a composition of vector primitives on flat, 1d arrays. With such a formulation, we can deploy highly optimised Deep Learning GPU and SIMD frameworks. On a moderate GPU, using PyTorch, we obtain up to 8 × improvement over the best parallel state-of-the-art implemented in C++ and running on an expensive 32-core machine. More importantly, our approach represents a novel abstraction showing that redesigning graph operations as a series of vectorised primitives makes highly-parallel analytics both easier and more accessible for developers. We posit that such an approach can vastly accelerate the use of cheap GPU hardware in complex graph analytics.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信