Elf:基于擦除的无损浮点压缩

Proc. VLDB Endow. Pub Date : 2023-03-01 DOI:10.14778/3587136.3587149

Ruiyuan Li, Zheng Li, Yi Wu, Chao Chen, Yu Zheng

{"title":"Elf:基于擦除的无损浮点压缩","authors":"Ruiyuan Li, Zheng Li, Yi Wu, Chao Chen, Yu Zheng","doi":"10.14778/3587136.3587149","DOIUrl":null,"url":null,"abstract":"\n There are a prohibitively large number of floating-point time series data generated at an unprecedentedly high rate. An efficient, compact and lossless compression for time series data is of great importance for a wide range of scenarios. Most existing lossless floating-point compression methods are based on the XOR operation, but they do not fully exploit the trailing zeros, which usually results in an unsatisfactory compression ratio. This paper proposes an Erasing-based Lossless Floating-point compression algorithm, i.e.,\n Elf.\n The main idea of\n Elf\n is to erase the last few bits (i.e., set them to zero) of floating-point values, so the XORed values are supposed to contain many trailing zeros. The challenges of the erasing-based method are three-fold. First, how to quickly determine the erased bits? Second, how to losslessly recover the original data from the erased ones? Third, how to compactly encode the erased data? Through rigorous mathematical analysis,\n Elf\n can directly determine the erased bits and restore the original values without losing any precision. To further improve the compression ratio, we propose a novel encoding strategy for the XORed values with many trailing zeros.\n Elf\n works in a streaming fashion. It takes only\n O\n (\n N\n ) (where\n N\n is the length of a time series) in time and\n O\n (1) in space, and achieves a notable compression ratio with a theoretical guarantee. Extensive experiments using 22 datasets show the powerful performance of\n Elf\n compared with 9 advanced competitors.\n","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"37 1","pages":"1763-1776"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Elf: Erasing-based Lossless Floating-Point Compression\",\"authors\":\"Ruiyuan Li, Zheng Li, Yi Wu, Chao Chen, Yu Zheng\",\"doi\":\"10.14778/3587136.3587149\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n There are a prohibitively large number of floating-point time series data generated at an unprecedentedly high rate. An efficient, compact and lossless compression for time series data is of great importance for a wide range of scenarios. Most existing lossless floating-point compression methods are based on the XOR operation, but they do not fully exploit the trailing zeros, which usually results in an unsatisfactory compression ratio. This paper proposes an Erasing-based Lossless Floating-point compression algorithm, i.e.,\\n Elf.\\n The main idea of\\n Elf\\n is to erase the last few bits (i.e., set them to zero) of floating-point values, so the XORed values are supposed to contain many trailing zeros. The challenges of the erasing-based method are three-fold. First, how to quickly determine the erased bits? Second, how to losslessly recover the original data from the erased ones? Third, how to compactly encode the erased data? Through rigorous mathematical analysis,\\n Elf\\n can directly determine the erased bits and restore the original values without losing any precision. To further improve the compression ratio, we propose a novel encoding strategy for the XORed values with many trailing zeros.\\n Elf\\n works in a streaming fashion. It takes only\\n O\\n (\\n N\\n ) (where\\n N\\n is the length of a time series) in time and\\n O\\n (1) in space, and achieves a notable compression ratio with a theoretical guarantee. Extensive experiments using 22 datasets show the powerful performance of\\n Elf\\n compared with 9 advanced competitors.\\n\",\"PeriodicalId\":20467,\"journal\":{\"name\":\"Proc. VLDB Endow.\",\"volume\":\"37 1\",\"pages\":\"1763-1776\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proc. VLDB Endow.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14778/3587136.3587149\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. VLDB Endow.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14778/3587136.3587149","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

以前所未有的高速率生成了数量惊人的浮点时间序列数据。对时间序列数据进行高效、紧凑和无损的压缩，对于各种场景都具有重要意义。现有的大多数无损浮点压缩方法都是基于异或操作，但它们并没有充分利用后面的零，这通常会导致令人不满意的压缩比。提出了一种基于擦除的无损浮点压缩算法Elf。Elf的主要思想是擦除浮点值的最后几位(即将它们设置为零)，因此xor值应该包含许多末尾的零。基于擦除的方法面临三方面的挑战。首先，如何快速确定被擦除的位?第二，如何从被擦除的数据中无损地恢复原始数据?第三，如何对擦除后的数据进行紧凑编码?通过严格的数学分析，Elf可以直接确定被擦除的比特，并在不损失任何精度的情况下恢复原始值。为了进一步提高压缩比，我们提出了一种新的编码策略来处理带有多个尾零的xor值。Elf以流媒体方式工作。它在时间上只需要O (N)(其中N为时间序列的长度)，在空间上只需要O(1)，并且在理论保证下获得了显著的压缩比。使用22个数据集进行的大量实验表明，Elf与9个先进的竞争对手相比具有强大的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Elf: Erasing-based Lossless Floating-Point Compression

There are a prohibitively large number of floating-point time series data generated at an unprecedentedly high rate. An efficient, compact and lossless compression for time series data is of great importance for a wide range of scenarios. Most existing lossless floating-point compression methods are based on the XOR operation, but they do not fully exploit the trailing zeros, which usually results in an unsatisfactory compression ratio. This paper proposes an Erasing-based Lossless Floating-point compression algorithm, i.e., Elf. The main idea of Elf is to erase the last few bits (i.e., set them to zero) of floating-point values, so the XORed values are supposed to contain many trailing zeros. The challenges of the erasing-based method are three-fold. First, how to quickly determine the erased bits? Second, how to losslessly recover the original data from the erased ones? Third, how to compactly encode the erased data? Through rigorous mathematical analysis, Elf can directly determine the erased bits and restore the original values without losing any precision. To further improve the compression ratio, we propose a novel encoding strategy for the XORed values with many trailing zeros. Elf works in a streaming fashion. It takes only O ( N ) (where N is the length of a time series) in time and O (1) in space, and achieves a notable compression ratio with a theoretical guarantee. Extensive experiments using 22 datasets show the powerful performance of Elf compared with 9 advanced competitors.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proc. VLDB Endow.

自引率

0.00%

发文量