数据移动受限条件下稀疏矩阵的可编程加速

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2019-05-01 DOI:10.1109/IPDPSW.2019.00016

Arjun Rawal, Yuanwei Fang, A. Chien

{"title":"数据移动受限条件下稀疏矩阵的可编程加速","authors":"Arjun Rawal, Yuanwei Fang, A. Chien","doi":"10.1109/IPDPSW.2019.00016","DOIUrl":null,"url":null,"abstract":"Data movement cost is a critical performance concern in today's computing systems. We propose a heterogeneous architecture that combines a CPU core with an efficient data recoding accelerator and evaluate it on sparse matrix computation. Such computations underly a wide range of important computations such as partial differential equation solvers, sequence alignment, and machine learning and are often data movement limited. The data recoding accelerator is orders of magnitude more energy efficient than a conventional CPU for recoding, allowing sparse matrix representation to be optimized for data movement. We evaluate the heterogeneous system with a recoding accelerator using the TAMU sparse matrix library, studying >369 diverse sparse matrix examples finding geometric mean performance benefits of 2.4x. In contrast, CPU's exhibit poor recoding performance (up to 30x worse), making data representation optimization infeasible. Holding SpMV performance constant, adding the recoding optimization and accelerator can produce power reductions of 63% and 51% on DDR and HBM-based memory systems, respectively, when evaluated on a set of 7 representative matrices. These results show the promise of this new heterogeneous architecture approach.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Programmable Acceleration for Sparse Matrices in a Data-Movement Limited World\",\"authors\":\"Arjun Rawal, Yuanwei Fang, A. Chien\",\"doi\":\"10.1109/IPDPSW.2019.00016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data movement cost is a critical performance concern in today's computing systems. We propose a heterogeneous architecture that combines a CPU core with an efficient data recoding accelerator and evaluate it on sparse matrix computation. Such computations underly a wide range of important computations such as partial differential equation solvers, sequence alignment, and machine learning and are often data movement limited. The data recoding accelerator is orders of magnitude more energy efficient than a conventional CPU for recoding, allowing sparse matrix representation to be optimized for data movement. We evaluate the heterogeneous system with a recoding accelerator using the TAMU sparse matrix library, studying >369 diverse sparse matrix examples finding geometric mean performance benefits of 2.4x. In contrast, CPU's exhibit poor recoding performance (up to 30x worse), making data representation optimization infeasible. Holding SpMV performance constant, adding the recoding optimization and accelerator can produce power reductions of 63% and 51% on DDR and HBM-based memory systems, respectively, when evaluated on a set of 7 representative matrices. These results show the promise of this new heterogeneous architecture approach.\",\"PeriodicalId\":292054,\"journal\":{\"name\":\"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2019.00016\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2019.00016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

在当今的计算系统中，数据移动成本是一个关键的性能问题。我们提出了一种将CPU核心与高效数据编码加速器相结合的异构架构，并在稀疏矩阵计算上对其进行了评估。这些计算是许多重要计算的基础，如偏微分方程求解、序列对齐和机器学习，并且通常是数据移动受限的。数据重新编码加速器的能效比传统的CPU要高几个数量级，允许对数据移动进行稀疏矩阵表示优化。我们使用TAMU稀疏矩阵库使用编码加速器评估异构系统，研究了>369个不同的稀疏矩阵示例，发现几何平均性能优势为2.4倍。相比之下，CPU表现出较差的编码性能(差30倍)，使得数据表示优化不可行。当在一组7个代表性矩阵上进行评估时，保持SpMV性能不变，添加重新编码优化和加速器可以在基于DDR和hbm的内存系统上分别降低63%和51%的功耗。这些结果显示了这种新的异构体系结构方法的前景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Programmable Acceleration for Sparse Matrices in a Data-Movement Limited World

Data movement cost is a critical performance concern in today's computing systems. We propose a heterogeneous architecture that combines a CPU core with an efficient data recoding accelerator and evaluate it on sparse matrix computation. Such computations underly a wide range of important computations such as partial differential equation solvers, sequence alignment, and machine learning and are often data movement limited. The data recoding accelerator is orders of magnitude more energy efficient than a conventional CPU for recoding, allowing sparse matrix representation to be optimized for data movement. We evaluate the heterogeneous system with a recoding accelerator using the TAMU sparse matrix library, studying >369 diverse sparse matrix examples finding geometric mean performance benefits of 2.4x. In contrast, CPU's exhibit poor recoding performance (up to 30x worse), making data representation optimization infeasible. Holding SpMV performance constant, adding the recoding optimization and accelerator can produce power reductions of 63% and 51% on DDR and HBM-based memory systems, respectively, when evaluated on a set of 7 representative matrices. These results show the promise of this new heterogeneous architecture approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量