Closed-Form Solutions for Dense Matrix-Matrix Multiplication on Heterogeneous Platforms Using Divisible Load Analysis

G. Barlas, L. E. Hiny
{"title":"Closed-Form Solutions for Dense Matrix-Matrix Multiplication on Heterogeneous Platforms Using Divisible Load Analysis","authors":"G. Barlas, L. E. Hiny","doi":"10.1109/PDP2018.2018.00067","DOIUrl":null,"url":null,"abstract":"In this paper we analytically solve the partitioning problem for performing matrix multiplication on a cluster of heterogeneous multicore machines, equipped with an accelerator, typically a GPU. We derive closed-form solutions that not only solve the problem in an exact manner, but they also allow for predictive analysis that can guide system design. Our work allows an optimum partitioning to be calculated in linear time with respect to the number of cores in the system. The static partitioning afforded by our Divisible Load Theory (DLT) based analysis, minimizes communication overhead and improves efficiency. Our work leverages existing optimized Dense Linear Algebra (DLA) libraries, such as cuBLAS and BLAS, which translates to an easy deployment that can readily exploit state-of-the-art tools. A comparison study concludes the paper, highlighting the beneficial effect of our partitioning approach.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP2018.2018.00067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In this paper we analytically solve the partitioning problem for performing matrix multiplication on a cluster of heterogeneous multicore machines, equipped with an accelerator, typically a GPU. We derive closed-form solutions that not only solve the problem in an exact manner, but they also allow for predictive analysis that can guide system design. Our work allows an optimum partitioning to be calculated in linear time with respect to the number of cores in the system. The static partitioning afforded by our Divisible Load Theory (DLT) based analysis, minimizes communication overhead and improves efficiency. Our work leverages existing optimized Dense Linear Algebra (DLA) libraries, such as cuBLAS and BLAS, which translates to an easy deployment that can readily exploit state-of-the-art tools. A comparison study concludes the paper, highlighting the beneficial effect of our partitioning approach.
基于可分载荷分析的非均质平台上密集矩阵-矩阵乘法的封闭解
在本文中,我们分析解决了在异构多核机器集群上执行矩阵乘法的分区问题,配备了加速器,通常是GPU。我们推导出封闭形式的解决方案,不仅以精确的方式解决问题,而且还允许进行可指导系统设计的预测分析。我们的工作允许在线性时间内根据系统中的核心数量计算最佳分区。我们基于可分负载理论(DLT)的分析提供的静态分区,最大限度地减少了通信开销并提高了效率。我们的工作利用了现有的优化的密集线性代数(DLA)库,如cuBLAS和BLAS,这转化为一个容易的部署,可以很容易地利用最先进的工具。最后通过对比研究,强调了我们的划分方法的有益效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信