Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI:10.1109/ICPP.2004.1327917

Benjamin C. Lee, R. Vuduc, J. Demmel, K. Yelick

{"title":"Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply","authors":"Benjamin C. Lee, R. Vuduc, J. Demmel, K. Yelick","doi":"10.1109/ICPP.2004.1327917","DOIUrl":null,"url":null,"abstract":"We present optimizations for sparse matrix-vector multiply SpMV and its generalization to multiple vectors, SpMM, when the matrix is symmetric: (1) symmetric storage, (2) register blocking, and (3) vector blocking. Combined with register blocking, symmetry saves more than 50% in matrix storage. We also show performance speedups of 2.1/spl times/ for SpMV and 2.6/spl times/ for SpMM, when compared to the best nonsymmetric register blocked implementation. We present an approach for the selection of tuning parameters, based on empirical modeling and search that consists of three steps: (1) Off-line benchmark, (2) Runtime search, and (3) Heuristic performance model. This approach generally selects parameters to achieve performance with 85% of that achieved with exhaustive search. We evaluate our implementations with respect to upper bounds on performance. Our model bounds performance by considering only the cost of memory operations and using lower bounds on the number of cache misses. Our optimized codes are within 68% of the upper bounds.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"69","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Parallel Processing, 2004. ICPP 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2004.1327917","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 69

Abstract

We present optimizations for sparse matrix-vector multiply SpMV and its generalization to multiple vectors, SpMM, when the matrix is symmetric: (1) symmetric storage, (2) register blocking, and (3) vector blocking. Combined with register blocking, symmetry saves more than 50% in matrix storage. We also show performance speedups of 2.1/spl times/ for SpMV and 2.6/spl times/ for SpMM, when compared to the best nonsymmetric register blocked implementation. We present an approach for the selection of tuning parameters, based on empirical modeling and search that consists of three steps: (1) Off-line benchmark, (2) Runtime search, and (3) Heuristic performance model. This approach generally selects parameters to achieve performance with 85% of that achieved with exhaustive search. We evaluate our implementations with respect to upper bounds on performance. Our model bounds performance by considering only the cost of memory operations and using lower bounds on the number of cache misses. Our optimized codes are within 68% of the upper bounds.

查看原文本刊更多论文

对称稀疏矩阵向量乘法的评价与自动调优性能模型

当矩阵是对称的时，我们提出了稀疏矩阵-向量乘法SpMV及其推广到多个向量SpMM的优化方法:(1)对称存储，(2)寄存器阻塞，(3)向量阻塞。结合寄存器块，对称性节省了50%以上的矩阵存储。我们还展示了与最佳非对称寄存器阻塞实现相比，SpMV和SpMM的性能加速分别为2.1/spl times/和2.6/spl times/。我们提出了一种基于经验建模和搜索的调优参数选择方法，该方法包括三个步骤:(1)离线基准测试，(2)运行时搜索和(3)启发式性能模型。这种方法通常选择参数来实现性能，其性能是穷举搜索的85%。我们根据性能的上限来评估我们的实现。我们的模型仅考虑内存操作的成本，并使用缓存丢失次数的下限来限制性能。我们优化的代码在上限的68%以内。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Conference on Parallel Processing, 2004. ICPP 2004.

自引率

0.00%

发文量