改进gpu上稀疏矩阵向量乘法的自适应和压缩

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI:10.1109/IPDPSW.2014.106

Marco Maggioni, T. Berger-Wolf

{"title":"改进gpu上稀疏矩阵向量乘法的自适应和压缩","authors":"Marco Maggioni, T. Berger-Wolf","doi":"10.1109/IPDPSW.2014.106","DOIUrl":null,"url":null,"abstract":"Numerous applications in science and engineering rely on sparse linear algebra. The efficiency of a fundamental kernel such as the Sparse Matrix-Vector multiplication (SpMV) is crucial for solving increasingly complex computational problems. However, the SpMV is notorious for its extremely low arithmetic intensity and irregular memory patterns, posing a challenge for optimization. Over the last few years, an extensive amount of literature has been devoted to implementing SpMV on Graphic Processing Units (GPUs), with the aim of exploiting the available fine-grain parallelism and memory bandwidth. In this paper, we propose to efficiently combine adaptivity and compression into an ELL-based sparse format in order to improve the state-of-the-art of the SpMV on Graphic Processing Units (GPUs). The foundation of our work is AdELL, an efficient sparse data structure based on the idea of distributing working threads to rows according to their computational load, creating balanced hardware-level blocks (warps) while coping with the irregular matrix structure. We designed a lightweight index compression scheme based on delta encoding and warp granularity that can be transparently embedded into AdELL, leading to an immediate performance benefit associated with the bandwidth-limited nature of the SpMV. The proposed integration provides a highly-optimized novel sparse matrix format known as Compressed Adaptive ELL (CoAdELL). We evaluated the effectiveness of our approach on a large set of benchmarks from heterogeneous application domains. The results show consistent improvements for double-precision SpMV calculations over the AdELL baseline. Moreover, we assessed the general relevance of CoAdELL with respect to other optimized GPU-based sparse matrix formats. We drew a direct comparison with clSpMV and BRO-HYB, obtaining sufficient experimental evidence (33% geometric average improvement over clSpMV and 43% over BRO-HYB) to propose our research work as the novel state-of-the-art.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"CoAdELL: Adaptivity and Compression for Improving Sparse Matrix-Vector Multiplication on GPUs\",\"authors\":\"Marco Maggioni, T. Berger-Wolf\",\"doi\":\"10.1109/IPDPSW.2014.106\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Numerous applications in science and engineering rely on sparse linear algebra. The efficiency of a fundamental kernel such as the Sparse Matrix-Vector multiplication (SpMV) is crucial for solving increasingly complex computational problems. However, the SpMV is notorious for its extremely low arithmetic intensity and irregular memory patterns, posing a challenge for optimization. Over the last few years, an extensive amount of literature has been devoted to implementing SpMV on Graphic Processing Units (GPUs), with the aim of exploiting the available fine-grain parallelism and memory bandwidth. In this paper, we propose to efficiently combine adaptivity and compression into an ELL-based sparse format in order to improve the state-of-the-art of the SpMV on Graphic Processing Units (GPUs). The foundation of our work is AdELL, an efficient sparse data structure based on the idea of distributing working threads to rows according to their computational load, creating balanced hardware-level blocks (warps) while coping with the irregular matrix structure. We designed a lightweight index compression scheme based on delta encoding and warp granularity that can be transparently embedded into AdELL, leading to an immediate performance benefit associated with the bandwidth-limited nature of the SpMV. The proposed integration provides a highly-optimized novel sparse matrix format known as Compressed Adaptive ELL (CoAdELL). We evaluated the effectiveness of our approach on a large set of benchmarks from heterogeneous application domains. The results show consistent improvements for double-precision SpMV calculations over the AdELL baseline. Moreover, we assessed the general relevance of CoAdELL with respect to other optimized GPU-based sparse matrix formats. We drew a direct comparison with clSpMV and BRO-HYB, obtaining sufficient experimental evidence (33% geometric average improvement over clSpMV and 43% over BRO-HYB) to propose our research work as the novel state-of-the-art.\",\"PeriodicalId\":153864,\"journal\":{\"name\":\"2014 IEEE International Parallel & Distributed Processing Symposium Workshops\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE International Parallel & Distributed Processing Symposium Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2014.106\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2014.106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

科学和工程中的许多应用都依赖于稀疏线性代数。稀疏矩阵向量乘法(SpMV)等基本核的效率对于解决日益复杂的计算问题至关重要。然而，SpMV因其极低的算术强度和不规则的内存模式而臭名昭著，这给优化带来了挑战。在过去的几年中，大量的文献致力于在图形处理单元(gpu)上实现SpMV，目的是利用可用的细粒度并行性和内存带宽。在本文中，我们提出将自适应和压缩有效地结合到基于ell的稀疏格式中，以提高图形处理单元(gpu)上SpMV的最新技术。我们工作的基础是AdELL，这是一种高效的稀疏数据结构，基于根据计算负载将工作线程分配到行的思想，在处理不规则矩阵结构的同时创建平衡的硬件级块(扭曲)。我们设计了一种基于增量编码和翘曲粒度的轻量级索引压缩方案，可以透明地嵌入到AdELL中，从而使SpMV的带宽限制特性立即带来性能优势。提出的集成提供了一种高度优化的新型稀疏矩阵格式，称为压缩自适应ELL (CoAdELL)。我们在来自异构应用程序领域的大量基准测试中评估了我们的方法的有效性。结果表明，与AdELL基线相比，双精度SpMV计算得到了一致的改进。此外，我们评估了CoAdELL相对于其他优化的基于gpu的稀疏矩阵格式的一般相关性。我们与clSpMV和BRO-HYB进行了直接比较，获得了足够的实验证据(比clSpMV几何平均提高33%，比BRO-HYB几何平均提高43%)，提出我们的研究工作是最新的技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CoAdELL: Adaptivity and Compression for Improving Sparse Matrix-Vector Multiplication on GPUs

Numerous applications in science and engineering rely on sparse linear algebra. The efficiency of a fundamental kernel such as the Sparse Matrix-Vector multiplication (SpMV) is crucial for solving increasingly complex computational problems. However, the SpMV is notorious for its extremely low arithmetic intensity and irregular memory patterns, posing a challenge for optimization. Over the last few years, an extensive amount of literature has been devoted to implementing SpMV on Graphic Processing Units (GPUs), with the aim of exploiting the available fine-grain parallelism and memory bandwidth. In this paper, we propose to efficiently combine adaptivity and compression into an ELL-based sparse format in order to improve the state-of-the-art of the SpMV on Graphic Processing Units (GPUs). The foundation of our work is AdELL, an efficient sparse data structure based on the idea of distributing working threads to rows according to their computational load, creating balanced hardware-level blocks (warps) while coping with the irregular matrix structure. We designed a lightweight index compression scheme based on delta encoding and warp granularity that can be transparently embedded into AdELL, leading to an immediate performance benefit associated with the bandwidth-limited nature of the SpMV. The proposed integration provides a highly-optimized novel sparse matrix format known as Compressed Adaptive ELL (CoAdELL). We evaluated the effectiveness of our approach on a large set of benchmarks from heterogeneous application domains. The results show consistent improvements for double-precision SpMV calculations over the AdELL baseline. Moreover, we assessed the general relevance of CoAdELL with respect to other optimized GPU-based sparse matrix formats. We drew a direct comparison with clSpMV and BRO-HYB, obtaining sufficient experimental evidence (33% geometric average improvement over clSpMV and 43% over BRO-HYB) to propose our research work as the novel state-of-the-art.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 IEEE International Parallel & Distributed Processing Symposium Workshops

自引率

0.00%

发文量