Dynamically Balanced Synchronization-Avoiding LU Factorization with Multicore and GPUs

Simplice Donfack, S. Tomov, J. Dongarra
{"title":"Dynamically Balanced Synchronization-Avoiding LU Factorization with Multicore and GPUs","authors":"Simplice Donfack, S. Tomov, J. Dongarra","doi":"10.1109/IPDPSW.2014.109","DOIUrl":null,"url":null,"abstract":"Graphics processing units (GPUs) brought huge performance improvements in the scientific and numerical fields. We present an efficient hybrid CPU/GPU approach that is portable, dynamically and efficiently balances the workload between the CPUs and the GPUs, and avoidsdata transfer bottlenecks that are frequently present in numerical algorithms. Our approach determines the amount of initial work to assign to the CPUs before the execution, and then dynamically balances workloads during the execution. Then, we present a theoretical model to guide the choice of the initial amount of work for the CPUs. The validation of our model allows our approach to self-adapt on any architecture using the manufacturer's characteristics of the underlying machine. We illustrate our method for the LU factorization. For this case, we show that the use of our approach combined with a communication avoiding LU algorithm is efficient. For example, our experiments on a 24 cores AMD opteron 6172 show that by adding one GPU (Tesla S2050) we accelerate LU up to 2.4× compared to the corresponding routine in MKL using 24 cores. The comparisons with MAGMA also show significant improvements.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2014.109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

Graphics processing units (GPUs) brought huge performance improvements in the scientific and numerical fields. We present an efficient hybrid CPU/GPU approach that is portable, dynamically and efficiently balances the workload between the CPUs and the GPUs, and avoidsdata transfer bottlenecks that are frequently present in numerical algorithms. Our approach determines the amount of initial work to assign to the CPUs before the execution, and then dynamically balances workloads during the execution. Then, we present a theoretical model to guide the choice of the initial amount of work for the CPUs. The validation of our model allows our approach to self-adapt on any architecture using the manufacturer's characteristics of the underlying machine. We illustrate our method for the LU factorization. For this case, we show that the use of our approach combined with a communication avoiding LU algorithm is efficient. For example, our experiments on a 24 cores AMD opteron 6172 show that by adding one GPU (Tesla S2050) we accelerate LU up to 2.4× compared to the corresponding routine in MKL using 24 cores. The comparisons with MAGMA also show significant improvements.
动态平衡同步-避免多核和gpu的LU分解
图形处理单元(gpu)在科学和数值领域带来了巨大的性能改进。我们提出了一种高效的CPU/GPU混合方法,该方法可移植,动态有效地平衡CPU和GPU之间的工作负载,并避免了数值算法中经常出现的数据传输瓶颈。我们的方法确定在执行之前分配给cpu的初始工作量,然后在执行期间动态平衡工作负载。然后,我们提出了一个理论模型来指导cpu初始工作量的选择。我们的模型的验证允许我们的方法使用底层机器的制造商特征自适应任何架构。我们举例说明了LU分解的方法。对于这种情况,我们证明了将我们的方法与通信避免LU算法结合使用是有效的。例如,我们在24核AMD opteron 6172上的实验表明,通过添加一个GPU (Tesla S2050),与使用24核的MKL中的相应例程相比,我们将LU加速到2.4倍。与MAGMA的比较也显示出显著的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信