Scalable Direct-Iterative Hybrid Solver for Sparse Matrices on Multi-Core and Vector Architectures

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2020-01-15 DOI:10.1145/3368474.3368484

K. Ono, Toshihiro Kato, S. Ohshima, T. Nanri

{"title":"Scalable Direct-Iterative Hybrid Solver for Sparse Matrices on Multi-Core and Vector Architectures","authors":"K. Ono, Toshihiro Kato, S. Ohshima, T. Nanri","doi":"10.1145/3368474.3368484","DOIUrl":null,"url":null,"abstract":"In the present paper, we propose an efficient direct-iterative hybrid solver for sparse matrices that can derive the scalability of the latest multi-core, many-core, and vector architectures and examine the execution performance of the proposed SLOR-PCR method. We also present an efficient implementation of the PCR algorithm for SIMD and vector architectures so that it is easy to output instructions optimized by the compiler. The proposed hybrid method has high cache reusability, which is favorable for modern low B/F architecture because efficient use of the cache can mitigate the memory bandwidth limitation. The measured performance revealed that the SLOR-PCR solver showed excellent scalability up to 352 cores on the cc-NUMA environment, and the achieved performance was higher than that of the conventional Jacobi and Red-Black ordering method by a factor of 3.6 to 8.3 on the SIMD architecture. In addition, the maximum speedup in computation time was observed to be a factor of 6.3 on the cc-NUMA architecture with 352 cores.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3368474.3368484","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

In the present paper, we propose an efficient direct-iterative hybrid solver for sparse matrices that can derive the scalability of the latest multi-core, many-core, and vector architectures and examine the execution performance of the proposed SLOR-PCR method. We also present an efficient implementation of the PCR algorithm for SIMD and vector architectures so that it is easy to output instructions optimized by the compiler. The proposed hybrid method has high cache reusability, which is favorable for modern low B/F architecture because efficient use of the cache can mitigate the memory bandwidth limitation. The measured performance revealed that the SLOR-PCR solver showed excellent scalability up to 352 cores on the cc-NUMA environment, and the achieved performance was higher than that of the conventional Jacobi and Red-Black ordering method by a factor of 3.6 to 8.3 on the SIMD architecture. In addition, the maximum speedup in computation time was observed to be a factor of 6.3 on the cc-NUMA architecture with 352 cores.

查看原文本刊更多论文

多核和矢量结构下稀疏矩阵的可伸缩直接迭代混合求解器

在本文中，我们提出了一种高效的稀疏矩阵直接迭代混合求解器，可以推导出最新的多核、多核和矢量架构的可扩展性，并检查所提出的SLOR-PCR方法的执行性能。我们还提出了一种有效的PCR算法，用于SIMD和矢量架构，以便于编译器优化输出指令。该混合方法具有较高的缓存重用性，有利于现代低B/F架构，因为有效利用缓存可以缓解内存带宽限制。测试结果表明，SLOR-PCR求解器在cc-NUMA环境下具有良好的可扩展性，最多可达352个核，在SIMD架构下的性能比传统的Jacobi和Red-Black排序方法高3.6 ~ 8.3倍。此外，在具有352个内核的cc-NUMA架构上，计算时间的最大加速是6.3倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region

自引率

0.00%

发文量