Generalised vectorisation for sparse matrix: vector multiplication

Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms Pub Date : 2015-11-15 DOI:10.1145/2833179.2833185

A. N. Yzelman

引用次数: 10

Abstract

This work generalises the various ways in which a sparse matrix--vector (SpMV) multiplication can be vectorised. It arrives at a novel data structure that generalises three earlier well-known data structures for sparse computations: the Blocked CRS format, the (sliced) ELLPACK format, and segmented scan based formats. The new data structure is relevant since efficient use of new hardware requires the use of increasingly wide vector registers. Normally, the use of vectorisation for sparse computations is limited due to bandwidth constraints. In cases where computations are limited by memory latencies instead of memory bandwidth, however, vectorisation can still help performance. The Intel Xeon Phi, appearing as a component in several top-500 supercomputers, displays exactly this behaviour for SpMV multiplication. On this architecture the use of the new generalised vectorisation scheme increases performance up to 178 percent.

查看原文本刊更多论文

稀疏矩阵的广义向量化:向量乘法

这项工作推广了稀疏矩阵向量(SpMV)乘法可以矢量化的各种方法。它得出了一种新的数据结构，它概括了早先已知的用于稀疏计算的三种数据结构:阻塞CRS格式、(切片)ELLPACK格式和基于分段扫描的格式。新的数据结构是相关的，因为有效地使用新的硬件需要使用越来越宽的向量寄存器。通常，由于带宽限制，稀疏计算中向量化的使用受到限制。然而，在计算受到内存延迟而不是内存带宽限制的情况下，向量化仍然可以帮助提高性能。英特尔Xeon Phi处理器，作为几个500强超级计算机的组件，在SpMV乘法中表现出了完全相同的行为。在这种架构上，使用新的广义向量化方案将性能提高了178%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms

自引率

0.00%

发文量