An OpenMP-only Linear Algebra Library for Distributed Architectures

2022 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) Pub Date : 2022-11-01 DOI:10.1109/SBAC-PADW56527.2022.00013

Carla Cardoso, H. Yviquel, G. Valarini, Gustavo Leite, Rodrigo Ceccato, M. Pereira, Alan Souza, G. Araújo

{"title":"An OpenMP-only Linear Algebra Library for Distributed Architectures","authors":"Carla Cardoso, H. Yviquel, G. Valarini, Gustavo Leite, Rodrigo Ceccato, M. Pereira, Alan Souza, G. Araújo","doi":"10.1109/SBAC-PADW56527.2022.00013","DOIUrl":null,"url":null,"abstract":"This paper presents a dense linear algebra library for distributed memory systems called OMPC PLASMA. It leverages the OpenMP Cluster (OMPC) programming model to enable the execution of the PLASMA library using task parallelism on a distributed cluster architecture. OpenMP Cluster model is used to define the task regions that are then distribute across the cluster nodes by the OMPC runtime that automatically manages task scheduling, communications between nodes, and fault tolerance. The OMPC PLASMA library modifies various PLASMA functions to distribute the matrix across the nodes and perform the calculation using threads of the node. Experimental results show that OMPC PLASMA achieves 4.00x with 4 worker nodes, 7.00x with 8 worker nodes, and 12.00x with 16 worker nodes acceleration over its original implementation for a single node. A 3.00x speedup is achieved when comparing OMPC PLASMA execution to ScaLAPACK, for 4 worker nodes, and a matrix size of 90k×90k.","PeriodicalId":263889,"journal":{"name":"2022 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PADW56527.2022.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper presents a dense linear algebra library for distributed memory systems called OMPC PLASMA. It leverages the OpenMP Cluster (OMPC) programming model to enable the execution of the PLASMA library using task parallelism on a distributed cluster architecture. OpenMP Cluster model is used to define the task regions that are then distribute across the cluster nodes by the OMPC runtime that automatically manages task scheduling, communications between nodes, and fault tolerance. The OMPC PLASMA library modifies various PLASMA functions to distribute the matrix across the nodes and perform the calculation using threads of the node. Experimental results show that OMPC PLASMA achieves 4.00x with 4 worker nodes, 7.00x with 8 worker nodes, and 12.00x with 16 worker nodes acceleration over its original implementation for a single node. A 3.00x speedup is achieved when comparing OMPC PLASMA execution to ScaLAPACK, for 4 worker nodes, and a matrix size of 90k×90k.

查看原文本刊更多论文

一个OpenMP-only线性代数库的分布式架构

本文提出了一个用于分布式存储系统的密集线性代数库——OMPC PLASMA。它利用OpenMP Cluster (OMPC)编程模型，在分布式集群架构上使用任务并行性来支持PLASMA库的执行。OpenMP集群模型用于定义任务区域，然后由自动管理任务调度、节点间通信和容错的OMPC运行时在集群节点之间分发任务区域。OMPC PLASMA库修改各种PLASMA函数来跨节点分布矩阵，并使用节点的线程执行计算。实验结果表明，对于单个节点，OMPC PLASMA在4个工作节点时实现了4.00倍的加速，在8个工作节点时实现了7.00倍的加速，在16个工作节点时实现了12.00倍的加速。当将4个工作节点和90k×90k矩阵大小的OMPC PLASMA执行与ScaLAPACK进行比较时，实现了3.00倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

自引率

0.00%

发文量