Sparta: high-performance, element-wise sparse tensor contraction on heterogeneous memory

Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2021-02-17 DOI:10.1145/3437801.3441581

Jiawen Liu, Jie Ren, R. Gioiosa, Dong Li, Jiajia Li

{"title":"Sparta: high-performance, element-wise sparse tensor contraction on heterogeneous memory","authors":"Jiawen Liu, Jie Ren, R. Gioiosa, Dong Li, Jiajia Li","doi":"10.1145/3437801.3441581","DOIUrl":null,"url":null,"abstract":"Sparse tensor contractions appear commonly in many applications. Efficiently computing a two sparse tensor product is challenging: It not only inherits the challenges from common sparse matrix-matrix multiplication (SpGEMM), i.e., indirect memory access and unknown output size before computation, but also raises new challenges because of high dimensionality of tensors, expensive multi-dimensional index search, and massive intermediate and output data. To address the above challenges, we introduce three optimization techniques by using multi-dimensional, efficient hashtable representation for the accumulator and larger input tensor, and all-stage parallelization. Evaluating with 15 datasets, we show that Sparta brings 28 -- 576× speedup over the traditional sparse tensor contraction with sparse accumulator. With our proposed algorithm- and memory heterogeneity-aware data management, Sparta brings extra performance improvement on the heterogeneous memory with DRAM and Intel Optane DC Persistent Memory Module (PMM) over a state-of-the-art software-based data management solution, a hardware-based data management solution, and PMM-only by 30.7% (up to 98.5%), 10.7% (up to 28.3%) and 17% (up to 65.1%) respectively.","PeriodicalId":124852,"journal":{"name":"Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3437801.3441581","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

Sparse tensor contractions appear commonly in many applications. Efficiently computing a two sparse tensor product is challenging: It not only inherits the challenges from common sparse matrix-matrix multiplication (SpGEMM), i.e., indirect memory access and unknown output size before computation, but also raises new challenges because of high dimensionality of tensors, expensive multi-dimensional index search, and massive intermediate and output data. To address the above challenges, we introduce three optimization techniques by using multi-dimensional, efficient hashtable representation for the accumulator and larger input tensor, and all-stage parallelization. Evaluating with 15 datasets, we show that Sparta brings 28 -- 576× speedup over the traditional sparse tensor contraction with sparse accumulator. With our proposed algorithm- and memory heterogeneity-aware data management, Sparta brings extra performance improvement on the heterogeneous memory with DRAM and Intel Optane DC Persistent Memory Module (PMM) over a state-of-the-art software-based data management solution, a hardware-based data management solution, and PMM-only by 30.7% (up to 98.5%), 10.7% (up to 28.3%) and 17% (up to 65.1%) respectively.

查看原文本刊更多论文

Sparta:异构内存上的高性能、元素明智的稀疏张量收缩

稀疏张量收缩通常出现在许多应用中。高效地计算两个稀疏张量积是一个挑战:它不仅继承了常见的稀疏矩阵-矩阵乘法(SpGEMM)的挑战，即间接内存访问和计算前未知的输出大小，而且由于张量的高维性、昂贵的多维索引搜索以及大量的中间和输出数据而提出了新的挑战。为了解决上述挑战，我们引入了三种优化技术，即对累加器和更大输入张量使用多维、高效的哈希表表示，以及全阶段并行化。通过对15个数据集的评估，我们发现Sparta比使用稀疏累加器的传统稀疏张量收缩带来了28—576倍的加速。通过我们提出的算法和内存异构感知数据管理，Sparta在基于最先进的基于软件的数据管理解决方案、基于硬件的数据管理解决方案和PMM的异构内存上带来了额外的性能改进——分别提高了30.7%(最高98.5%)、10.7%(最高28.3%)和17%(最高65.1%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

自引率

0.00%

发文量