Closed-Form Solutions for Dense Matrix-Matrix Multiplication on Heterogeneous Platforms Using Divisible Load Analysis

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP) Pub Date : 2018-03-21 DOI:10.1109/PDP2018.2018.00067

G. Barlas, L. E. Hiny

引用次数: 0

Abstract

In this paper we analytically solve the partitioning problem for performing matrix multiplication on a cluster of heterogeneous multicore machines, equipped with an accelerator, typically a GPU. We derive closed-form solutions that not only solve the problem in an exact manner, but they also allow for predictive analysis that can guide system design. Our work allows an optimum partitioning to be calculated in linear time with respect to the number of cores in the system. The static partitioning afforded by our Divisible Load Theory (DLT) based analysis, minimizes communication overhead and improves efficiency. Our work leverages existing optimized Dense Linear Algebra (DLA) libraries, such as cuBLAS and BLAS, which translates to an easy deployment that can readily exploit state-of-the-art tools. A comparison study concludes the paper, highlighting the beneficial effect of our partitioning approach.

查看原文本刊更多论文

基于可分载荷分析的非均质平台上密集矩阵-矩阵乘法的封闭解

在本文中，我们分析解决了在异构多核机器集群上执行矩阵乘法的分区问题，配备了加速器，通常是GPU。我们推导出封闭形式的解决方案，不仅以精确的方式解决问题，而且还允许进行可指导系统设计的预测分析。我们的工作允许在线性时间内根据系统中的核心数量计算最佳分区。我们基于可分负载理论(DLT)的分析提供的静态分区，最大限度地减少了通信开销并提高了效率。我们的工作利用了现有的优化的密集线性代数(DLA)库，如cuBLAS和BLAS，这转化为一个容易的部署，可以很容易地利用最先进的工具。最后通过对比研究，强调了我们的划分方法的有益效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

自引率

0.00%

发文量