FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2020-11-07 DOI:10.1109/IPDPS49936.2021.00034

Md. Khaledur Rahman, Majedul Haque Sujon, A. Azad

引用次数: 29

Abstract

We develop a fused matrix multiplication kernel that unifies sampled dense-dense matrix multiplication and sparsedense matrix multiplication under a single operation called FusedMM. By using user-defined functions, FusedMM can capture almost all computational patterns needed by popular graph embedding and GNN approaches.FusedMM is an order of magnitude faster than its equivalent kernels in Deep Graph Library. The superior performance of FusedMM comes from the low-level vectorized kernels, a suitable load balancing scheme and an efficient utilization of the memory bandwidth. FusedMM can tune its performance using a code generator and perform equally well on Intel, AMD and ARM processors. FusedMM speeds up an end-to-end graph embedding algorithm by up to $28 \times$ on different processors. The source code is available at https://github.com/HipGraph/FusedMM.

查看原文本刊更多论文

用于图嵌入和图神经网络的统一SDDMM-SpMM核

我们开发了一个融合矩阵乘法核，它将采样密集矩阵乘法和稀疏密集矩阵乘法统一在一个称为FusedMM的操作下。通过使用用户定义函数，FusedMM可以捕获流行的图嵌入和GNN方法所需的几乎所有计算模式。FusedMM比Deep Graph Library中的等效内核快一个数量级。FusedMM的优越性能来自于底层向量化核、合适的负载均衡方案和对内存带宽的有效利用。FusedMM可以使用代码生成器调整其性能，并在英特尔，AMD和ARM处理器上表现同样出色。FusedMM在不同的处理器上将端到端图形嵌入算法的速度提高了28倍。源代码可从https://github.com/HipGraph/FusedMM获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量