Floating-point data compression at 75 Gb/s on a GPU

GPGPU-4 Pub Date : 2011-03-05 DOI:10.1145/1964179.1964189

M. A. O'Neil, Martin Burtscher

引用次数: 65

Abstract

Numeric simulations often generate large amounts of data that need to be stored or sent to other compute nodes. This paper investigates whether GPUs are powerful enough to make real-time data compression and decompression possible in such environments, that is, whether they can operate at the 32- or 40-Gb/s throughput of emerging network cards. The fastest parallel CPU-based floating-point data compression algorithm operates below 20 Gb/s on eight Xeon cores, which is significantly slower than the network speed and thus insufficient for compression to be practical in high-end networks. As a remedy, we have created the highly parallel GFC compression algorithm for double-precision floating-point data. This algorithm is specifically designed for GPUs. It compresses at a minimum of 75 Gb/s, decompresses at 90 Gb/s and above, and can therefore improve internode communication throughput on current and upcoming networks by fully saturating the interconnection links with compressed data.

查看原文本刊更多论文

在GPU上以75 Gb/s的速度压缩浮点数据

数值模拟通常会生成大量需要存储或发送到其他计算节点的数据。本文研究gpu是否足够强大，可以在这样的环境中实现实时数据压缩和解压缩，也就是说，它们是否可以在新兴网卡的32或40 gb /s吞吐量下运行。最快的基于并行cpu的浮点数据压缩算法在8个Xeon核上的运行速度低于20 Gb/s，这明显低于网络速度，因此不足以在高端网络中实现压缩。作为补救措施，我们为双精度浮点数据创建了高度并行的GFC压缩算法。该算法是专门为gpu设计的。它的压缩速率至少为75gb /s，解压缩速率为90gb /s或更高，因此可以通过压缩数据使互连链路充分饱和，从而提高当前和未来网络的节点间通信吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

GPGPU-4

自引率

0.00%

发文量