Distributed GPU Based Matrix Power Kernel for Geoscience Applications

A. Sedrakian, T. Guignon
{"title":"Distributed GPU Based Matrix Power Kernel for Geoscience Applications","authors":"A. Sedrakian, T. Guignon","doi":"10.2118/203947-ms","DOIUrl":null,"url":null,"abstract":"\n High-performance computing is at the heart of digital technology which allows to simulate complex physical phenomena. The current trend for hardware architectures is toward heterogeneous systems with multi-core CPUs accelerated by GPUs to get high computing power. The demand for fast solution of Geoscience simulations coupled with new computing architectures drives the need for challenging parallel algorithms. Such applications based on partial differential equations, requires to solve large and sparse linear system of equations. This work makes a step further in Matrix Powers Kernel (MPK) which is a crucial kernel in solving sparse linear systems using communication-avoiding methods. This class of methods deals with the degradation of performances observed beyond several nodes by decreasing the gap between the time necessary to perform the computations and the time needed to communicate the results. The proposed work consists of a new formulation for distributed MPK kernels for the cluster of GPUs where the pipeline communications could be overlapped by the computation. Also, appropriate data reorganization decreases the memory traffic between processors and accelerators and improves performance. The proposed structure is based on the separation of local and external components with different layers of interface nodes-due to the MPK algorithm-. The data is restructured in a way where all the data required by the neighbor process comes contiguously at the end, after the local one. Thanks to an assembly step, the contents of the messages for each neighbor are determined. Such data structure has a major impact on the efficiency of the solution, since it permits to design an appropriate communication scheme where the computation with local data can occur on the GPUs and the external ones on the CPUs. Moreover, it permits more efficient inter-process communication by an effective overlap of the communication by the computation in the asynchronous pipeline way. We validate our design through the test cases with different block matrices obtained from different reservoir simulations : fractured reservoir dual-medium, black-oil two phase-flow, and three phase-flow models. The experimental results demonstrate the performance of the proposed approach compared to state of the art. The proposed MPK running on several nodes of the GPU cluster provides a significant performance gain over equivalent Sparse Matrix Vector product (SpMV) which is already optimized and provides better scalability.","PeriodicalId":11146,"journal":{"name":"Day 1 Tue, October 26, 2021","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Day 1 Tue, October 26, 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2118/203947-ms","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

High-performance computing is at the heart of digital technology which allows to simulate complex physical phenomena. The current trend for hardware architectures is toward heterogeneous systems with multi-core CPUs accelerated by GPUs to get high computing power. The demand for fast solution of Geoscience simulations coupled with new computing architectures drives the need for challenging parallel algorithms. Such applications based on partial differential equations, requires to solve large and sparse linear system of equations. This work makes a step further in Matrix Powers Kernel (MPK) which is a crucial kernel in solving sparse linear systems using communication-avoiding methods. This class of methods deals with the degradation of performances observed beyond several nodes by decreasing the gap between the time necessary to perform the computations and the time needed to communicate the results. The proposed work consists of a new formulation for distributed MPK kernels for the cluster of GPUs where the pipeline communications could be overlapped by the computation. Also, appropriate data reorganization decreases the memory traffic between processors and accelerators and improves performance. The proposed structure is based on the separation of local and external components with different layers of interface nodes-due to the MPK algorithm-. The data is restructured in a way where all the data required by the neighbor process comes contiguously at the end, after the local one. Thanks to an assembly step, the contents of the messages for each neighbor are determined. Such data structure has a major impact on the efficiency of the solution, since it permits to design an appropriate communication scheme where the computation with local data can occur on the GPUs and the external ones on the CPUs. Moreover, it permits more efficient inter-process communication by an effective overlap of the communication by the computation in the asynchronous pipeline way. We validate our design through the test cases with different block matrices obtained from different reservoir simulations : fractured reservoir dual-medium, black-oil two phase-flow, and three phase-flow models. The experimental results demonstrate the performance of the proposed approach compared to state of the art. The proposed MPK running on several nodes of the GPU cluster provides a significant performance gain over equivalent Sparse Matrix Vector product (SpMV) which is already optimized and provides better scalability.
基于分布式GPU的地球科学应用矩阵功率内核
高性能计算是数字技术的核心,它允许模拟复杂的物理现象。当前硬件体系结构的发展趋势是采用gpu加速的多核cpu来实现异构系统,以获得更高的计算能力。对地球科学模拟快速解决方案的需求,加上新的计算架构,推动了对具有挑战性的并行算法的需求。此类应用基于偏微分方程,需要求解大型且稀疏的线性方程组。本文在矩阵幂函数核(MPK)这一利用通信避免方法求解稀疏线性系统的关键核上作了进一步的研究。这类方法通过减少执行计算所需的时间和通信结果所需的时间之间的差距来处理在多个节点之外观察到的性能下降。本文提出了一种用于gpu集群的分布式MPK内核的新公式,其中管道通信可以通过计算重叠。此外,适当的数据重组可以减少处理器和加速器之间的内存流量并提高性能。所提出的结构基于使用不同层的接口节点分离本地和外部组件(由于MPK算法)。数据重构的方式是,相邻进程所需的所有数据都连续出现在末尾,位于本地进程之后。由于有一个组装步骤,因此确定了每个邻居的消息内容。这样的数据结构对解决方案的效率有很大的影响,因为它允许设计一个适当的通信方案,其中本地数据的计算可以在gpu上进行,而外部数据可以在cpu上进行。此外,它通过异步管道计算方式有效地重叠通信,从而允许更高效的进程间通信。我们通过不同区块矩阵的测试案例验证了我们的设计,这些区块矩阵来自不同的油藏模拟:裂缝性油藏双介质、黑油两相流和三相流模型。实验结果表明,与现有的方法相比,该方法具有良好的性能。所提出的MPK在GPU集群的多个节点上运行,与已经优化的等效稀疏矩阵向量积(SpMV)相比,提供了显着的性能增益,并提供了更好的可扩展性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信