Floating-point data compression at 75 Gb/s on a GPU

GPGPU-4 Pub Date : 2011-03-05 DOI:10.1145/1964179.1964189
M. A. O'Neil, Martin Burtscher
{"title":"Floating-point data compression at 75 Gb/s on a GPU","authors":"M. A. O'Neil, Martin Burtscher","doi":"10.1145/1964179.1964189","DOIUrl":null,"url":null,"abstract":"Numeric simulations often generate large amounts of data that need to be stored or sent to other compute nodes. This paper investigates whether GPUs are powerful enough to make real-time data compression and decompression possible in such environments, that is, whether they can operate at the 32- or 40-Gb/s throughput of emerging network cards. The fastest parallel CPU-based floating-point data compression algorithm operates below 20 Gb/s on eight Xeon cores, which is significantly slower than the network speed and thus insufficient for compression to be practical in high-end networks. As a remedy, we have created the highly parallel GFC compression algorithm for double-precision floating-point data. This algorithm is specifically designed for GPUs. It compresses at a minimum of 75 Gb/s, decompresses at 90 Gb/s and above, and can therefore improve internode communication throughput on current and upcoming networks by fully saturating the interconnection links with compressed data.","PeriodicalId":317571,"journal":{"name":"GPGPU-4","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"65","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GPGPU-4","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1964179.1964189","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 65

Abstract

Numeric simulations often generate large amounts of data that need to be stored or sent to other compute nodes. This paper investigates whether GPUs are powerful enough to make real-time data compression and decompression possible in such environments, that is, whether they can operate at the 32- or 40-Gb/s throughput of emerging network cards. The fastest parallel CPU-based floating-point data compression algorithm operates below 20 Gb/s on eight Xeon cores, which is significantly slower than the network speed and thus insufficient for compression to be practical in high-end networks. As a remedy, we have created the highly parallel GFC compression algorithm for double-precision floating-point data. This algorithm is specifically designed for GPUs. It compresses at a minimum of 75 Gb/s, decompresses at 90 Gb/s and above, and can therefore improve internode communication throughput on current and upcoming networks by fully saturating the interconnection links with compressed data.
在GPU上以75 Gb/s的速度压缩浮点数据
数值模拟通常会生成大量需要存储或发送到其他计算节点的数据。本文研究gpu是否足够强大,可以在这样的环境中实现实时数据压缩和解压缩,也就是说,它们是否可以在新兴网卡的32或40 gb /s吞吐量下运行。最快的基于并行cpu的浮点数据压缩算法在8个Xeon核上的运行速度低于20 Gb/s,这明显低于网络速度,因此不足以在高端网络中实现压缩。作为补救措施,我们为双精度浮点数据创建了高度并行的GFC压缩算法。该算法是专门为gpu设计的。它的压缩速率至少为75gb /s,解压缩速率为90gb /s或更高,因此可以通过压缩数据使互连链路充分饱和,从而提高当前和未来网络的节点间通信吞吐量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信