对角寻址矩阵昵称：如何提高 SpMV 性能

PAMM Pub Date : 2023-07-12 DOI:10.1002/pamm.202300228

J. Saak, J. Schulze

{"title":"对角寻址矩阵昵称：如何提高 SpMV 性能","authors":"J. Saak, J. Schulze","doi":"10.1002/pamm.202300228","DOIUrl":null,"url":null,"abstract":"We suggest a technique to reduce the storage size of sparse matrices at no loss of information. We call this technique Diagonally‐Addressed (DA) storage. It exploits the typically low matrix bandwidth of matrices arising in applications. For memory‐bound algorithms, this traffic reduction has direct benefits for both uni‐precision and multi‐precision algorithms. In particular, we demonstrate how to apply DA storage to the Compressed Sparse Rows (CSR) format and compare the performance in computing the Sparse Matrix Vector (SpMV) product, which is a basic building block of many iterative algorithms. We investigate 1367 matrices from the SuiteSparse Matrix Collection fitting into the CSR format using signed 32 bit indices. More than 95% of these matrices fit into the DA‐CSR format using 16 bit column indices, potentially after Reverse Cuthill‐McKee (RCM) reordering. Using IEEE 754 double$\\mathtt {double}$ precision scalars, we observe a performance uplift of 11% (single‐threaded) or 17.5% (multithreaded) on average when the traffic exceeds the size of the last‐level CPU cache. The predicted uplift in this scenario is 20%. For traffic within the CPU's combined level 2 and level 3 caches, the multithreaded performance uplift is over 40% for a few test matrices.","PeriodicalId":510616,"journal":{"name":"PAMM","volume":"192 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Diagonally‐Addressed Matrix Nicknack: How to improve SpMV performance\",\"authors\":\"J. Saak, J. Schulze\",\"doi\":\"10.1002/pamm.202300228\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We suggest a technique to reduce the storage size of sparse matrices at no loss of information. We call this technique Diagonally‐Addressed (DA) storage. It exploits the typically low matrix bandwidth of matrices arising in applications. For memory‐bound algorithms, this traffic reduction has direct benefits for both uni‐precision and multi‐precision algorithms. In particular, we demonstrate how to apply DA storage to the Compressed Sparse Rows (CSR) format and compare the performance in computing the Sparse Matrix Vector (SpMV) product, which is a basic building block of many iterative algorithms. We investigate 1367 matrices from the SuiteSparse Matrix Collection fitting into the CSR format using signed 32 bit indices. More than 95% of these matrices fit into the DA‐CSR format using 16 bit column indices, potentially after Reverse Cuthill‐McKee (RCM) reordering. Using IEEE 754 double$\\\\mathtt {double}$ precision scalars, we observe a performance uplift of 11% (single‐threaded) or 17.5% (multithreaded) on average when the traffic exceeds the size of the last‐level CPU cache. The predicted uplift in this scenario is 20%. For traffic within the CPU's combined level 2 and level 3 caches, the multithreaded performance uplift is over 40% for a few test matrices.\",\"PeriodicalId\":510616,\"journal\":{\"name\":\"PAMM\",\"volume\":\"192 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PAMM\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/pamm.202300228\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PAMM","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/pamm.202300228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们提出了一种在不损失信息的情况下减少稀疏矩阵存储大小的技术。我们称这种技术为对角线寻址（DA）存储。它利用了应用中出现的矩阵通常较低的矩阵带宽。对于内存绑定算法，这种流量减少对单精度和多精度算法都有直接好处。我们特别演示了如何将 DA 存储应用于压缩稀疏行（CSR）格式，并比较了计算稀疏矩阵矢量（SpMV）乘积的性能，这是许多迭代算法的基本构件。我们研究了 SuiteSparse Matrix Collection 中的 1367 个使用带符号 32 位索引拟合成 CSR 格式的矩阵。其中 95% 以上的矩阵适合使用 16 位列索引的 DA-CSR 格式，可能是在反向 Cuthill-McKee (RCM) 重新排序之后。使用 IEEE 754 double$\mathtt {double}$ 精度标量，当流量超过 CPU 末级缓存的大小时，我们观察到性能平均提升了 11%（单线程）或 17.5%（多线程）。在这种情况下，预测的性能提升率为 20%。对于 CPU 二级和三级高速缓存中的流量，在一些测试矩阵中，多线程性能提升超过 40%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Diagonally‐Addressed Matrix Nicknack: How to improve SpMV performance

We suggest a technique to reduce the storage size of sparse matrices at no loss of information. We call this technique Diagonally‐Addressed (DA) storage. It exploits the typically low matrix bandwidth of matrices arising in applications. For memory‐bound algorithms, this traffic reduction has direct benefits for both uni‐precision and multi‐precision algorithms. In particular, we demonstrate how to apply DA storage to the Compressed Sparse Rows (CSR) format and compare the performance in computing the Sparse Matrix Vector (SpMV) product, which is a basic building block of many iterative algorithms. We investigate 1367 matrices from the SuiteSparse Matrix Collection fitting into the CSR format using signed 32 bit indices. More than 95% of these matrices fit into the DA‐CSR format using 16 bit column indices, potentially after Reverse Cuthill‐McKee (RCM) reordering. Using IEEE 754 double$\mathtt {double}$ precision scalars, we observe a performance uplift of 11% (single‐threaded) or 17.5% (multithreaded) on average when the traffic exceeds the size of the last‐level CPU cache. The predicted uplift in this scenario is 20%. For traffic within the CPU's combined level 2 and level 3 caches, the multithreaded performance uplift is over 40% for a few test matrices.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PAMM

自引率

0.00%

发文量