Sim-Piece:通过相似分段合并实现的高精度分段线性逼近

Xenophon Kitsios, Panagiotis Liakos, Katia Papakonstantinopoulou, Y. Kotidis
{"title":"Sim-Piece:通过相似分段合并实现的高精度分段线性逼近","authors":"Xenophon Kitsios, Panagiotis Liakos, Katia Papakonstantinopoulou, Y. Kotidis","doi":"10.14778/3594512.3594521","DOIUrl":null,"url":null,"abstract":"Approximating series of timestamped data points using a sequence of line segments with a maximum error guarantee is a fundamental data compression problem, termed as piecewise linear approximation (PLA). Due to the increasing need to analyze massive collections of time-series data in diverse domains, the problem has recently received significant attention, and recent PLA algorithms that have emerged do help us handle the overwhelming amount of information, at the cost of some precision loss. More specifically, these algorithms entail a trade-off between the maximum precision loss and the space savings achieved. However, advances in the area of lossless compression are undercutting the offerings of PLA techniques in real datasets. In this work, we propose Sim-Piece, a novel lossy compression algorithm for time-series data that optimizes the space requirements of representing PLA line segments, by finding the minimum number of groups we can organize these segments into, to represent them jointly. Our experimental evaluation demonstrates that our approach readily outperforms competing techniques, attaining compression ratios with more than twofold improvement on average over what PLA algorithms can offer. This allows for providing significantly higher accuracy with equivalent space requirements. Moreover, our algorithm, due to the simplicity of its merging phase, imposes little overhead while compacting the PLA description, offering a significantly improved trade-off between space and running time. The aforementioned benefits of our approach significantly improve the efficiency in which we can store time-series data, while allowing a tight maximum error in the representation of their values.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Sim-Piece: Highly Accurate Piecewise Linear Approximation through Similar Segment Merging\",\"authors\":\"Xenophon Kitsios, Panagiotis Liakos, Katia Papakonstantinopoulou, Y. Kotidis\",\"doi\":\"10.14778/3594512.3594521\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Approximating series of timestamped data points using a sequence of line segments with a maximum error guarantee is a fundamental data compression problem, termed as piecewise linear approximation (PLA). Due to the increasing need to analyze massive collections of time-series data in diverse domains, the problem has recently received significant attention, and recent PLA algorithms that have emerged do help us handle the overwhelming amount of information, at the cost of some precision loss. More specifically, these algorithms entail a trade-off between the maximum precision loss and the space savings achieved. However, advances in the area of lossless compression are undercutting the offerings of PLA techniques in real datasets. In this work, we propose Sim-Piece, a novel lossy compression algorithm for time-series data that optimizes the space requirements of representing PLA line segments, by finding the minimum number of groups we can organize these segments into, to represent them jointly. Our experimental evaluation demonstrates that our approach readily outperforms competing techniques, attaining compression ratios with more than twofold improvement on average over what PLA algorithms can offer. This allows for providing significantly higher accuracy with equivalent space requirements. Moreover, our algorithm, due to the simplicity of its merging phase, imposes little overhead while compacting the PLA description, offering a significantly improved trade-off between space and running time. The aforementioned benefits of our approach significantly improve the efficiency in which we can store time-series data, while allowing a tight maximum error in the representation of their values.\",\"PeriodicalId\":20467,\"journal\":{\"name\":\"Proc. VLDB Endow.\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proc. VLDB Endow.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14778/3594512.3594521\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. VLDB Endow.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14778/3594512.3594521","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

使用具有最大误差保证的线段序列逼近一系列时间戳数据点是一个基本的数据压缩问题,称为分段线性逼近(PLA)。由于越来越需要分析不同领域的大量时间序列数据,这个问题最近受到了极大的关注,最近出现的PLA算法确实帮助我们处理了大量的信息,但代价是一些精度损失。更具体地说,这些算法需要在最大精度损失和节省空间之间进行权衡。然而,无损压缩领域的进步正在削弱PLA技术在真实数据集中的应用。在这项工作中,我们提出了Sim-Piece,一种新的时间序列数据有损压缩算法,通过找到我们可以组织这些线段的最小组数来共同表示PLA线段,从而优化表示PLA线段的空间要求。我们的实验评估表明,我们的方法很容易优于竞争技术,获得的压缩比平均比PLA算法可以提供的压缩比提高两倍以上。这允许在同等的空间要求下提供更高的精度。此外,我们的算法,由于其合并阶段的简单性,在压缩PLA描述时施加很少的开销,在空间和运行时间之间提供了显着改进的权衡。我们方法的上述优点显著提高了存储时间序列数据的效率,同时允许在其值的表示中有一个很小的最大误差。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Sim-Piece: Highly Accurate Piecewise Linear Approximation through Similar Segment Merging
Approximating series of timestamped data points using a sequence of line segments with a maximum error guarantee is a fundamental data compression problem, termed as piecewise linear approximation (PLA). Due to the increasing need to analyze massive collections of time-series data in diverse domains, the problem has recently received significant attention, and recent PLA algorithms that have emerged do help us handle the overwhelming amount of information, at the cost of some precision loss. More specifically, these algorithms entail a trade-off between the maximum precision loss and the space savings achieved. However, advances in the area of lossless compression are undercutting the offerings of PLA techniques in real datasets. In this work, we propose Sim-Piece, a novel lossy compression algorithm for time-series data that optimizes the space requirements of representing PLA line segments, by finding the minimum number of groups we can organize these segments into, to represent them jointly. Our experimental evaluation demonstrates that our approach readily outperforms competing techniques, attaining compression ratios with more than twofold improvement on average over what PLA algorithms can offer. This allows for providing significantly higher accuracy with equivalent space requirements. Moreover, our algorithm, due to the simplicity of its merging phase, imposes little overhead while compacting the PLA description, offering a significantly improved trade-off between space and running time. The aforementioned benefits of our approach significantly improve the efficiency in which we can store time-series data, while allowing a tight maximum error in the representation of their values.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信