CUDA架构上的并行霍夫曼编码器

2014 IEEE Visual Communications and Image Processing Conference Pub Date : 2014-12-01 DOI:10.1109/VCIP.2014.7051566

Habibelahi Rahmani, C. Topal, C. Akinlar

{"title":"CUDA架构上的并行霍夫曼编码器","authors":"Habibelahi Rahmani, C. Topal, C. Akinlar","doi":"10.1109/VCIP.2014.7051566","DOIUrl":null,"url":null,"abstract":"We present a parallel implementation of the widely-used entropy encoding algorithm, the Huffman coder, on the NVIDIA CUDA architecture. After constructing the Huffman codeword tree serially, we proceed in parallel by generating a byte stream where each byte represents a single bit of the compressed output stream. The final step is then to combine each consecutive 8 bytes into a single byte in parallel to generate the final compressed output bit stream. Experimental results show that we can achieve up to 22× speedups compared to the serial CPU implementation without any constraint on the maximum codeword length or data entropy.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"132 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"A parallel Huffman coder on the CUDA architecture\",\"authors\":\"Habibelahi Rahmani, C. Topal, C. Akinlar\",\"doi\":\"10.1109/VCIP.2014.7051566\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a parallel implementation of the widely-used entropy encoding algorithm, the Huffman coder, on the NVIDIA CUDA architecture. After constructing the Huffman codeword tree serially, we proceed in parallel by generating a byte stream where each byte represents a single bit of the compressed output stream. The final step is then to combine each consecutive 8 bytes into a single byte in parallel to generate the final compressed output bit stream. Experimental results show that we can achieve up to 22× speedups compared to the serial CPU implementation without any constraint on the maximum codeword length or data entropy.\",\"PeriodicalId\":166978,\"journal\":{\"name\":\"2014 IEEE Visual Communications and Image Processing Conference\",\"volume\":\"132 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE Visual Communications and Image Processing Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VCIP.2014.7051566\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Visual Communications and Image Processing Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VCIP.2014.7051566","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

我们提出了一个在NVIDIA CUDA架构上并行实现广泛使用的熵编码算法，霍夫曼编码器。在连续构造霍夫曼码字树之后，我们通过生成字节流并行进行，其中每个字节代表压缩输出流的单个位。最后一步是将每个连续的8个字节并行地组合成一个字节，以生成最终的压缩输出位流。实验结果表明，在不受最大码字长度或数据熵限制的情况下，与串行CPU实现相比，我们可以实现高达22倍的速度提升。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A parallel Huffman coder on the CUDA architecture

We present a parallel implementation of the widely-used entropy encoding algorithm, the Huffman coder, on the NVIDIA CUDA architecture. After constructing the Huffman codeword tree serially, we proceed in parallel by generating a byte stream where each byte represents a single bit of the compressed output stream. The final step is then to combine each consecutive 8 bytes into a single byte in parallel to generate the final compressed output bit stream. Experimental results show that we can achieve up to 22× speedups compared to the serial CPU implementation without any constraint on the maximum codeword length or data entropy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 IEEE Visual Communications and Image Processing Conference

自引率

0.00%

发文量