An OpenMP-only Linear Algebra Library for Distributed Architectures

Carla Cardoso, H. Yviquel, G. Valarini, Gustavo Leite, Rodrigo Ceccato, M. Pereira, Alan Souza, G. Araújo
{"title":"An OpenMP-only Linear Algebra Library for Distributed Architectures","authors":"Carla Cardoso, H. Yviquel, G. Valarini, Gustavo Leite, Rodrigo Ceccato, M. Pereira, Alan Souza, G. Araújo","doi":"10.1109/SBAC-PADW56527.2022.00013","DOIUrl":null,"url":null,"abstract":"This paper presents a dense linear algebra library for distributed memory systems called OMPC PLASMA. It leverages the OpenMP Cluster (OMPC) programming model to enable the execution of the PLASMA library using task parallelism on a distributed cluster architecture. OpenMP Cluster model is used to define the task regions that are then distribute across the cluster nodes by the OMPC runtime that automatically manages task scheduling, communications between nodes, and fault tolerance. The OMPC PLASMA library modifies various PLASMA functions to distribute the matrix across the nodes and perform the calculation using threads of the node. Experimental results show that OMPC PLASMA achieves 4.00x with 4 worker nodes, 7.00x with 8 worker nodes, and 12.00x with 16 worker nodes acceleration over its original implementation for a single node. A 3.00x speedup is achieved when comparing OMPC PLASMA execution to ScaLAPACK, for 4 worker nodes, and a matrix size of 90k×90k.","PeriodicalId":263889,"journal":{"name":"2022 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PADW56527.2022.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper presents a dense linear algebra library for distributed memory systems called OMPC PLASMA. It leverages the OpenMP Cluster (OMPC) programming model to enable the execution of the PLASMA library using task parallelism on a distributed cluster architecture. OpenMP Cluster model is used to define the task regions that are then distribute across the cluster nodes by the OMPC runtime that automatically manages task scheduling, communications between nodes, and fault tolerance. The OMPC PLASMA library modifies various PLASMA functions to distribute the matrix across the nodes and perform the calculation using threads of the node. Experimental results show that OMPC PLASMA achieves 4.00x with 4 worker nodes, 7.00x with 8 worker nodes, and 12.00x with 16 worker nodes acceleration over its original implementation for a single node. A 3.00x speedup is achieved when comparing OMPC PLASMA execution to ScaLAPACK, for 4 worker nodes, and a matrix size of 90k×90k.
一个OpenMP-only线性代数库的分布式架构
本文提出了一个用于分布式存储系统的密集线性代数库——OMPC PLASMA。它利用OpenMP Cluster (OMPC)编程模型,在分布式集群架构上使用任务并行性来支持PLASMA库的执行。OpenMP集群模型用于定义任务区域,然后由自动管理任务调度、节点间通信和容错的OMPC运行时在集群节点之间分发任务区域。OMPC PLASMA库修改各种PLASMA函数来跨节点分布矩阵,并使用节点的线程执行计算。实验结果表明,对于单个节点,OMPC PLASMA在4个工作节点时实现了4.00倍的加速,在8个工作节点时实现了7.00倍的加速,在16个工作节点时实现了12.00倍的加速。当将4个工作节点和90k×90k矩阵大小的OMPC PLASMA执行与ScaLAPACK进行比较时,实现了3.00倍的加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信