哥白尼:描述稀疏工作负载中使用的压缩格式的性能含义

2021 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2020-11-22 DOI:10.1109/IISWC53511.2021.00012

Bahar Asgari, Ramyad Hadidi, Joshua Dierberger, Charlotte Steinichen, Hyesoon Kim

{"title":"哥白尼:描述稀疏工作负载中使用的压缩格式的性能含义","authors":"Bahar Asgari, Ramyad Hadidi, Joshua Dierberger, Charlotte Steinichen, Hyesoon Kim","doi":"10.1109/IISWC53511.2021.00012","DOIUrl":null,"url":null,"abstract":"Sparse matrices are the key ingredients of several application domains, from scientific computing to machine learning. The primary challenge with sparse matrices has been efficiently storing and transferring data, for which many sparse formats have been proposed to significantly eliminate zero entries. Such formats, essentially designed to optimize memory footprint, may not be as successful in performing faster processing. In other words, although they allow faster data transfer and improve memory bandwidth utilization - the classic challenge of sparse problems - their decompression mechanism can potentially create a computation bottleneck. Not only is this challenge not resolved, but also it becomes more serious with the advent of domain-specific architectures (DSAs), as they intend to more aggressively improve performance. The performance implications of using various formats along with DSAs, however, has not been extensively studied by prior work. To fill this gap of knowledge, we characterize the impact of using seven frequently used compression formats on performance, based on a DSA for sparse matrix-vector multiplication (SpMV), implemented on an FPGA using high-level synthesis (HLS) tools, a growing and popular method for developing DSAs. Seeking a fair comparison, we tailor and well-optimize the HLS implementation of decompression for each format. We explore metrics, including decompression overhead, latency, balance ratio, throughput, memory bandwidth utilization, resource utilization, and power consumption, on a variety of real-world and synthetic sparse workloads.","PeriodicalId":203713,"journal":{"name":"2021 IEEE International Symposium on Workload Characterization (IISWC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads\",\"authors\":\"Bahar Asgari, Ramyad Hadidi, Joshua Dierberger, Charlotte Steinichen, Hyesoon Kim\",\"doi\":\"10.1109/IISWC53511.2021.00012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sparse matrices are the key ingredients of several application domains, from scientific computing to machine learning. The primary challenge with sparse matrices has been efficiently storing and transferring data, for which many sparse formats have been proposed to significantly eliminate zero entries. Such formats, essentially designed to optimize memory footprint, may not be as successful in performing faster processing. In other words, although they allow faster data transfer and improve memory bandwidth utilization - the classic challenge of sparse problems - their decompression mechanism can potentially create a computation bottleneck. Not only is this challenge not resolved, but also it becomes more serious with the advent of domain-specific architectures (DSAs), as they intend to more aggressively improve performance. The performance implications of using various formats along with DSAs, however, has not been extensively studied by prior work. To fill this gap of knowledge, we characterize the impact of using seven frequently used compression formats on performance, based on a DSA for sparse matrix-vector multiplication (SpMV), implemented on an FPGA using high-level synthesis (HLS) tools, a growing and popular method for developing DSAs. Seeking a fair comparison, we tailor and well-optimize the HLS implementation of decompression for each format. We explore metrics, including decompression overhead, latency, balance ratio, throughput, memory bandwidth utilization, resource utilization, and power consumption, on a variety of real-world and synthetic sparse workloads.\",\"PeriodicalId\":203713,\"journal\":{\"name\":\"2021 IEEE International Symposium on Workload Characterization (IISWC)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Symposium on Workload Characterization (IISWC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IISWC53511.2021.00012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Symposium on Workload Characterization (IISWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC53511.2021.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

稀疏矩阵是几个应用领域的关键成分，从科学计算到机器学习。稀疏矩阵的主要挑战是有效地存储和传输数据，为此提出了许多稀疏格式来显著消除零条目。这种格式本质上是为了优化内存占用而设计的，但在执行更快的处理时可能不那么成功。换句话说，尽管它们允许更快的数据传输并提高内存带宽利用率(稀疏问题的经典挑战)，但它们的解压缩机制可能会造成计算瓶颈。这一挑战不仅没有得到解决，而且随着领域特定架构(dsa)的出现，它变得更加严重，因为它们打算更积极地提高性能。然而，使用各种格式和dsa对性能的影响在以前的工作中没有得到广泛的研究。为了填补这一知识空白，我们描述了使用七种常用压缩格式对性能的影响，基于稀疏矩阵向量乘法(SpMV)的DSA，使用高级合成(HLS)工具在FPGA上实现，这是一种不断发展和流行的开发DSA的方法。为了进行公平的比较，我们为每种格式量身定制并优化了HLS解压缩的实现。我们探讨了各种真实世界和合成稀疏工作负载的指标，包括解压开销、延迟、平衡比、吞吐量、内存带宽利用率、资源利用率和功耗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads

Sparse matrices are the key ingredients of several application domains, from scientific computing to machine learning. The primary challenge with sparse matrices has been efficiently storing and transferring data, for which many sparse formats have been proposed to significantly eliminate zero entries. Such formats, essentially designed to optimize memory footprint, may not be as successful in performing faster processing. In other words, although they allow faster data transfer and improve memory bandwidth utilization - the classic challenge of sparse problems - their decompression mechanism can potentially create a computation bottleneck. Not only is this challenge not resolved, but also it becomes more serious with the advent of domain-specific architectures (DSAs), as they intend to more aggressively improve performance. The performance implications of using various formats along with DSAs, however, has not been extensively studied by prior work. To fill this gap of knowledge, we characterize the impact of using seven frequently used compression formats on performance, based on a DSA for sparse matrix-vector multiplication (SpMV), implemented on an FPGA using high-level synthesis (HLS) tools, a growing and popular method for developing DSAs. Seeking a fair comparison, we tailor and well-optimize the HLS implementation of decompression for each format. We explore metrics, including decompression overhead, latency, balance ratio, throughput, memory bandwidth utilization, resource utilization, and power consumption, on a variety of real-world and synthetic sparse workloads.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Symposium on Workload Characterization (IISWC)

自引率

0.00%

发文量