改进gpu上稀疏矩阵向量乘法的自适应和压缩

Marco Maggioni, T. Berger-Wolf
{"title":"改进gpu上稀疏矩阵向量乘法的自适应和压缩","authors":"Marco Maggioni, T. Berger-Wolf","doi":"10.1109/IPDPSW.2014.106","DOIUrl":null,"url":null,"abstract":"Numerous applications in science and engineering rely on sparse linear algebra. The efficiency of a fundamental kernel such as the Sparse Matrix-Vector multiplication (SpMV) is crucial for solving increasingly complex computational problems. However, the SpMV is notorious for its extremely low arithmetic intensity and irregular memory patterns, posing a challenge for optimization. Over the last few years, an extensive amount of literature has been devoted to implementing SpMV on Graphic Processing Units (GPUs), with the aim of exploiting the available fine-grain parallelism and memory bandwidth. In this paper, we propose to efficiently combine adaptivity and compression into an ELL-based sparse format in order to improve the state-of-the-art of the SpMV on Graphic Processing Units (GPUs). The foundation of our work is AdELL, an efficient sparse data structure based on the idea of distributing working threads to rows according to their computational load, creating balanced hardware-level blocks (warps) while coping with the irregular matrix structure. We designed a lightweight index compression scheme based on delta encoding and warp granularity that can be transparently embedded into AdELL, leading to an immediate performance benefit associated with the bandwidth-limited nature of the SpMV. The proposed integration provides a highly-optimized novel sparse matrix format known as Compressed Adaptive ELL (CoAdELL). We evaluated the effectiveness of our approach on a large set of benchmarks from heterogeneous application domains. The results show consistent improvements for double-precision SpMV calculations over the AdELL baseline. Moreover, we assessed the general relevance of CoAdELL with respect to other optimized GPU-based sparse matrix formats. We drew a direct comparison with clSpMV and BRO-HYB, obtaining sufficient experimental evidence (33% geometric average improvement over clSpMV and 43% over BRO-HYB) to propose our research work as the novel state-of-the-art.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"CoAdELL: Adaptivity and Compression for Improving Sparse Matrix-Vector Multiplication on GPUs\",\"authors\":\"Marco Maggioni, T. Berger-Wolf\",\"doi\":\"10.1109/IPDPSW.2014.106\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Numerous applications in science and engineering rely on sparse linear algebra. The efficiency of a fundamental kernel such as the Sparse Matrix-Vector multiplication (SpMV) is crucial for solving increasingly complex computational problems. However, the SpMV is notorious for its extremely low arithmetic intensity and irregular memory patterns, posing a challenge for optimization. Over the last few years, an extensive amount of literature has been devoted to implementing SpMV on Graphic Processing Units (GPUs), with the aim of exploiting the available fine-grain parallelism and memory bandwidth. In this paper, we propose to efficiently combine adaptivity and compression into an ELL-based sparse format in order to improve the state-of-the-art of the SpMV on Graphic Processing Units (GPUs). The foundation of our work is AdELL, an efficient sparse data structure based on the idea of distributing working threads to rows according to their computational load, creating balanced hardware-level blocks (warps) while coping with the irregular matrix structure. We designed a lightweight index compression scheme based on delta encoding and warp granularity that can be transparently embedded into AdELL, leading to an immediate performance benefit associated with the bandwidth-limited nature of the SpMV. The proposed integration provides a highly-optimized novel sparse matrix format known as Compressed Adaptive ELL (CoAdELL). We evaluated the effectiveness of our approach on a large set of benchmarks from heterogeneous application domains. The results show consistent improvements for double-precision SpMV calculations over the AdELL baseline. Moreover, we assessed the general relevance of CoAdELL with respect to other optimized GPU-based sparse matrix formats. We drew a direct comparison with clSpMV and BRO-HYB, obtaining sufficient experimental evidence (33% geometric average improvement over clSpMV and 43% over BRO-HYB) to propose our research work as the novel state-of-the-art.\",\"PeriodicalId\":153864,\"journal\":{\"name\":\"2014 IEEE International Parallel & Distributed Processing Symposium Workshops\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE International Parallel & Distributed Processing Symposium Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2014.106\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2014.106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

科学和工程中的许多应用都依赖于稀疏线性代数。稀疏矩阵向量乘法(SpMV)等基本核的效率对于解决日益复杂的计算问题至关重要。然而,SpMV因其极低的算术强度和不规则的内存模式而臭名昭著,这给优化带来了挑战。在过去的几年中,大量的文献致力于在图形处理单元(gpu)上实现SpMV,目的是利用可用的细粒度并行性和内存带宽。在本文中,我们提出将自适应和压缩有效地结合到基于ell的稀疏格式中,以提高图形处理单元(gpu)上SpMV的最新技术。我们工作的基础是AdELL,这是一种高效的稀疏数据结构,基于根据计算负载将工作线程分配到行的思想,在处理不规则矩阵结构的同时创建平衡的硬件级块(扭曲)。我们设计了一种基于增量编码和翘曲粒度的轻量级索引压缩方案,可以透明地嵌入到AdELL中,从而使SpMV的带宽限制特性立即带来性能优势。提出的集成提供了一种高度优化的新型稀疏矩阵格式,称为压缩自适应ELL (CoAdELL)。我们在来自异构应用程序领域的大量基准测试中评估了我们的方法的有效性。结果表明,与AdELL基线相比,双精度SpMV计算得到了一致的改进。此外,我们评估了CoAdELL相对于其他优化的基于gpu的稀疏矩阵格式的一般相关性。我们与clSpMV和BRO-HYB进行了直接比较,获得了足够的实验证据(比clSpMV几何平均提高33%,比BRO-HYB几何平均提高43%),提出我们的研究工作是最新的技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
CoAdELL: Adaptivity and Compression for Improving Sparse Matrix-Vector Multiplication on GPUs
Numerous applications in science and engineering rely on sparse linear algebra. The efficiency of a fundamental kernel such as the Sparse Matrix-Vector multiplication (SpMV) is crucial for solving increasingly complex computational problems. However, the SpMV is notorious for its extremely low arithmetic intensity and irregular memory patterns, posing a challenge for optimization. Over the last few years, an extensive amount of literature has been devoted to implementing SpMV on Graphic Processing Units (GPUs), with the aim of exploiting the available fine-grain parallelism and memory bandwidth. In this paper, we propose to efficiently combine adaptivity and compression into an ELL-based sparse format in order to improve the state-of-the-art of the SpMV on Graphic Processing Units (GPUs). The foundation of our work is AdELL, an efficient sparse data structure based on the idea of distributing working threads to rows according to their computational load, creating balanced hardware-level blocks (warps) while coping with the irregular matrix structure. We designed a lightweight index compression scheme based on delta encoding and warp granularity that can be transparently embedded into AdELL, leading to an immediate performance benefit associated with the bandwidth-limited nature of the SpMV. The proposed integration provides a highly-optimized novel sparse matrix format known as Compressed Adaptive ELL (CoAdELL). We evaluated the effectiveness of our approach on a large set of benchmarks from heterogeneous application domains. The results show consistent improvements for double-precision SpMV calculations over the AdELL baseline. Moreover, we assessed the general relevance of CoAdELL with respect to other optimized GPU-based sparse matrix formats. We drew a direct comparison with clSpMV and BRO-HYB, obtaining sufficient experimental evidence (33% geometric average improvement over clSpMV and 43% over BRO-HYB) to propose our research work as the novel state-of-the-art.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信