Elf:基于擦除的无损浮点压缩

Ruiyuan Li, Zheng Li, Yi Wu, Chao Chen, Yu Zheng
{"title":"Elf:基于擦除的无损浮点压缩","authors":"Ruiyuan Li, Zheng Li, Yi Wu, Chao Chen, Yu Zheng","doi":"10.14778/3587136.3587149","DOIUrl":null,"url":null,"abstract":"\n There are a prohibitively large number of floating-point time series data generated at an unprecedentedly high rate. An efficient, compact and lossless compression for time series data is of great importance for a wide range of scenarios. Most existing lossless floating-point compression methods are based on the XOR operation, but they do not fully exploit the trailing zeros, which usually results in an unsatisfactory compression ratio. This paper proposes an Erasing-based Lossless Floating-point compression algorithm, i.e.,\n Elf.\n The main idea of\n Elf\n is to erase the last few bits (i.e., set them to zero) of floating-point values, so the XORed values are supposed to contain many trailing zeros. The challenges of the erasing-based method are three-fold. First, how to quickly determine the erased bits? Second, how to losslessly recover the original data from the erased ones? Third, how to compactly encode the erased data? Through rigorous mathematical analysis,\n Elf\n can directly determine the erased bits and restore the original values without losing any precision. To further improve the compression ratio, we propose a novel encoding strategy for the XORed values with many trailing zeros.\n Elf\n works in a streaming fashion. It takes only\n O\n (\n N\n ) (where\n N\n is the length of a time series) in time and\n O\n (1) in space, and achieves a notable compression ratio with a theoretical guarantee. Extensive experiments using 22 datasets show the powerful performance of\n Elf\n compared with 9 advanced competitors.\n","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Elf: Erasing-based Lossless Floating-Point Compression\",\"authors\":\"Ruiyuan Li, Zheng Li, Yi Wu, Chao Chen, Yu Zheng\",\"doi\":\"10.14778/3587136.3587149\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n There are a prohibitively large number of floating-point time series data generated at an unprecedentedly high rate. An efficient, compact and lossless compression for time series data is of great importance for a wide range of scenarios. Most existing lossless floating-point compression methods are based on the XOR operation, but they do not fully exploit the trailing zeros, which usually results in an unsatisfactory compression ratio. This paper proposes an Erasing-based Lossless Floating-point compression algorithm, i.e.,\\n Elf.\\n The main idea of\\n Elf\\n is to erase the last few bits (i.e., set them to zero) of floating-point values, so the XORed values are supposed to contain many trailing zeros. The challenges of the erasing-based method are three-fold. First, how to quickly determine the erased bits? Second, how to losslessly recover the original data from the erased ones? Third, how to compactly encode the erased data? Through rigorous mathematical analysis,\\n Elf\\n can directly determine the erased bits and restore the original values without losing any precision. To further improve the compression ratio, we propose a novel encoding strategy for the XORed values with many trailing zeros.\\n Elf\\n works in a streaming fashion. It takes only\\n O\\n (\\n N\\n ) (where\\n N\\n is the length of a time series) in time and\\n O\\n (1) in space, and achieves a notable compression ratio with a theoretical guarantee. Extensive experiments using 22 datasets show the powerful performance of\\n Elf\\n compared with 9 advanced competitors.\\n\",\"PeriodicalId\":20467,\"journal\":{\"name\":\"Proc. VLDB Endow.\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proc. VLDB Endow.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14778/3587136.3587149\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. VLDB Endow.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14778/3587136.3587149","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

以前所未有的高速率生成了数量惊人的浮点时间序列数据。对时间序列数据进行高效、紧凑和无损的压缩,对于各种场景都具有重要意义。现有的大多数无损浮点压缩方法都是基于异或操作,但它们并没有充分利用后面的零,这通常会导致令人不满意的压缩比。提出了一种基于擦除的无损浮点压缩算法Elf。Elf的主要思想是擦除浮点值的最后几位(即将它们设置为零),因此xor值应该包含许多末尾的零。基于擦除的方法面临三方面的挑战。首先,如何快速确定被擦除的位?第二,如何从被擦除的数据中无损地恢复原始数据?第三,如何对擦除后的数据进行紧凑编码?通过严格的数学分析,Elf可以直接确定被擦除的比特,并在不损失任何精度的情况下恢复原始值。为了进一步提高压缩比,我们提出了一种新的编码策略来处理带有多个尾零的xor值。Elf以流媒体方式工作。它在时间上只需要O (N)(其中N为时间序列的长度),在空间上只需要O(1),并且在理论保证下获得了显著的压缩比。使用22个数据集进行的大量实验表明,Elf与9个先进的竞争对手相比具有强大的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Elf: Erasing-based Lossless Floating-Point Compression
There are a prohibitively large number of floating-point time series data generated at an unprecedentedly high rate. An efficient, compact and lossless compression for time series data is of great importance for a wide range of scenarios. Most existing lossless floating-point compression methods are based on the XOR operation, but they do not fully exploit the trailing zeros, which usually results in an unsatisfactory compression ratio. This paper proposes an Erasing-based Lossless Floating-point compression algorithm, i.e., Elf. The main idea of Elf is to erase the last few bits (i.e., set them to zero) of floating-point values, so the XORed values are supposed to contain many trailing zeros. The challenges of the erasing-based method are three-fold. First, how to quickly determine the erased bits? Second, how to losslessly recover the original data from the erased ones? Third, how to compactly encode the erased data? Through rigorous mathematical analysis, Elf can directly determine the erased bits and restore the original values without losing any precision. To further improve the compression ratio, we propose a novel encoding strategy for the XORed values with many trailing zeros. Elf works in a streaming fashion. It takes only O ( N ) (where N is the length of a time series) in time and O (1) in space, and achieves a notable compression ratio with a theoretical guarantee. Extensive experiments using 22 datasets show the powerful performance of Elf compared with 9 advanced competitors.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信