Lightweight Huffman Coding for Efficient GPU Compression

Proceedings of the 37th International Conference on Supercomputing Pub Date : 2023-06-21 DOI:10.1145/3577193.3593736

Milan Shah, Xiaodong Yu, S. Di, M. Becchi, F. Cappello

{"title":"Lightweight Huffman Coding for Efficient GPU Compression","authors":"Milan Shah, Xiaodong Yu, S. Di, M. Becchi, F. Cappello","doi":"10.1145/3577193.3593736","DOIUrl":null,"url":null,"abstract":"Lossy compression is often deployed in scientific applications to reduce data footprint and improve data transfers and I/O performance. Especially for applications requiring on-the-flight compression, it is essential to minimize compression's runtime. In this paper, we design a scheme to improve the performance of cuSZ, a GPU-based lossy compressor. We observe that Huffman coding - used by cuSZ to compress metadata generated during compression - incurs a performance overhead that can be significant, especially for smaller datasets. Our work seeks to reduce the Huffman coding runtime with minimal-to-no impact on cuSZ's compression efficiency. Our contributions are as follows. First, we examine a variety of probability distributions to determine which distributions closely model the input to cuSZ's Huffman coding stage. From these distributions, we create a dictionary of pre-computed codebooks such that during compression, a codebook is selected from the dictionary instead of computing a custom codebook. Second, we explore three codebook selection criteria to be applied at runtime. Finally, we evaluate our scheme on real-world datasets and in the context of two important application use cases, HDF5 and MPI, using an NVIDIA A100 GPU. Our evaluation shows that our method can reduce the Huffman coding penalty by a factor of 78--92×, translating to a total speedup of up to 5× over baseline cuSZ. Smaller HDF5 chunk sizes enjoy over an 8× speedup in compression and MPI messages on the scale of tens of MB have a 1.4--30.5× speedup in communication time.","PeriodicalId":424155,"journal":{"name":"Proceedings of the 37th International Conference on Supercomputing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3577193.3593736","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Lossy compression is often deployed in scientific applications to reduce data footprint and improve data transfers and I/O performance. Especially for applications requiring on-the-flight compression, it is essential to minimize compression's runtime. In this paper, we design a scheme to improve the performance of cuSZ, a GPU-based lossy compressor. We observe that Huffman coding - used by cuSZ to compress metadata generated during compression - incurs a performance overhead that can be significant, especially for smaller datasets. Our work seeks to reduce the Huffman coding runtime with minimal-to-no impact on cuSZ's compression efficiency. Our contributions are as follows. First, we examine a variety of probability distributions to determine which distributions closely model the input to cuSZ's Huffman coding stage. From these distributions, we create a dictionary of pre-computed codebooks such that during compression, a codebook is selected from the dictionary instead of computing a custom codebook. Second, we explore three codebook selection criteria to be applied at runtime. Finally, we evaluate our scheme on real-world datasets and in the context of two important application use cases, HDF5 and MPI, using an NVIDIA A100 GPU. Our evaluation shows that our method can reduce the Huffman coding penalty by a factor of 78--92×, translating to a total speedup of up to 5× over baseline cuSZ. Smaller HDF5 chunk sizes enjoy over an 8× speedup in compression and MPI messages on the scale of tens of MB have a 1.4--30.5× speedup in communication time.

查看原文本刊更多论文

高效GPU压缩的轻量级霍夫曼编码

有损压缩通常部署在科学应用程序中，以减少数据占用，提高数据传输和I/O性能。特别是对于需要动态压缩的应用程序，最小化压缩的运行时间是至关重要的。本文设计了一种改进基于gpu的有损压缩器cuSZ性能的方案。我们观察到，Huffman编码——cuSZ用来压缩压缩过程中生成的元数据——会产生很大的性能开销，特别是对于较小的数据集。我们的工作旨在减少霍夫曼编码运行时，对cuSZ的压缩效率影响最小甚至没有影响。我们的贡献如下。首先，我们检查了各种概率分布，以确定哪些分布与cuSZ的霍夫曼编码阶段的输入密切相关。从这些分布中，我们创建了一个预先计算的码本字典，以便在压缩过程中，从字典中选择一个码本，而不是计算自定义码本。其次，我们探讨了运行时应用的三个码本选择标准。最后，我们使用NVIDIA A100 GPU在现实世界数据集和两个重要应用用例HDF5和MPI的背景下评估我们的方案。我们的评估表明，我们的方法可以将霍夫曼编码惩罚减少78- 92倍，在基线cuSZ上转化为高达5倍的总加速。较小的HDF5块大小在压缩方面有超过8倍的加速，而几十MB的MPI消息在通信时间方面有1.4- 30.5倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 37th International Conference on Supercomputing

自引率

0.00%

发文量