Space-efficient Relative Error Order Sketch over Data Streams

Ying Zhang, Xuemin Lin, Jian Xu, Flip Korn, Wei Wang
{"title":"Space-efficient Relative Error Order Sketch over Data Streams","authors":"Ying Zhang, Xuemin Lin, Jian Xu, Flip Korn, Wei Wang","doi":"10.1109/ICDE.2006.145","DOIUrl":null,"url":null,"abstract":"We consider the problem of continuously maintaining order sketches over data streams with a relative rank error guarantee ∊. Novel space-efficient and one-scan randomised techniques are developed. Our first randomised algorithm can guarantee such a relative error precision ∊ with confidence 1 - \\delta using O( 1\\_ \\in \\frac{1} {2}2 log 1d log ∊^2N) space, where N is the number of data elements seen so far in a data stream. Then, a new one-scan space compression technique is developed. Combined with the first randomised algorithm, the one-scan space compression technique yields another one-scan randomised algorithm that guarantees the space requirement is O( 1\\frac{1} { \\in } log(1\\frac{1}{ \\in } log 1\\begin{gathered} \\frac{1}{\\delta } \\hfill \\\\ \\hfill \\\\ \\end{gathered} )\\frac{{\\log ^{2 + \\alpha } \\in N}} {{1 - 1/2^\\alpha }} (for\\alpha \\gt 0) on average while the worst case space remains O( \\frac{1}{{ \\in ^2 }}\\log \\frac{1} {\\delta }\\log \\in ^2 N). These results are immediately applicable to approximately computing quantiles over data streams with a relative error guarantee \\in and significantly improve the previous best space bound O( \\frac{1} {{ \\in ^3 }}\\log \\frac{1}{\\delta }\\log N). Our extensive experiment results demonstrate that both techniques can support an on-line computation against high speed data streams.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"17 1","pages":"51-51"},"PeriodicalIF":0.0000,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"22nd International Conference on Data Engineering (ICDE'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2006.145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26

Abstract

We consider the problem of continuously maintaining order sketches over data streams with a relative rank error guarantee ∊. Novel space-efficient and one-scan randomised techniques are developed. Our first randomised algorithm can guarantee such a relative error precision ∊ with confidence 1 - \delta using O( 1\_ \in \frac{1} {2}2 log 1d log ∊^2N) space, where N is the number of data elements seen so far in a data stream. Then, a new one-scan space compression technique is developed. Combined with the first randomised algorithm, the one-scan space compression technique yields another one-scan randomised algorithm that guarantees the space requirement is O( 1\frac{1} { \in } log(1\frac{1}{ \in } log 1\begin{gathered} \frac{1}{\delta } \hfill \\ \hfill \\ \end{gathered} )\frac{{\log ^{2 + \alpha } \in N}} {{1 - 1/2^\alpha }} (for\alpha \gt 0) on average while the worst case space remains O( \frac{1}{{ \in ^2 }}\log \frac{1} {\delta }\log \in ^2 N). These results are immediately applicable to approximately computing quantiles over data streams with a relative error guarantee \in and significantly improve the previous best space bound O( \frac{1} {{ \in ^3 }}\log \frac{1}{\delta }\log N). Our extensive experiment results demonstrate that both techniques can support an on-line computation against high speed data streams.
数据流上的空间效率相对错误顺序草图
考虑了具有相对秩误差保证的数据流上连续保持有序草图的问题。开发了新颖的空间高效和一次扫描随机技术。我们的第一个随机化算法可以使用O(1_ \in\frac{1} 22 log 1d log ^2N)空间保证这种相对误差精度为1 - \delta,其中N是迄今为止在数据流中看到的数据元素的数量。在此基础上,提出了一种新的单扫描空间压缩技术。与第一种随机化算法结合,一次扫描空间压缩技术产生另一种一次扫描随机化算法,该算法保证空间需求为O(1 {}\frac{1}{\in log(1 }\frac{1}{ \in } log 1)\begin{gathered} \frac{1}{\delta } \hfill \\ \hfill \\ \end{gathered} )\frac{{\log ^{2 + \alpha } \in N}} {{1 - 1/2^\alpha }} (用于\alpha \gt 平均为0),而最坏情况仍然为0 ( \frac{1}{{ \in ^2 }}\log \frac{1} {\delta }\log \in 这些结果立即适用于具有相对误差保证的数据流上的近似计算分位数 \in 并显著提高了之前的最佳空间界O( \frac{1} {{ \in ^3 }}\log \frac{1}{\delta }\log 我们广泛的实验结果表明,这两种技术都可以支持对高速数据流的在线计算。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信