Space-efficient Relative Error Order Sketch over Data Streams

22nd International Conference on Data Engineering (ICDE'06) Pub Date : 2006-04-03 DOI:10.1109/ICDE.2006.145

Ying Zhang, Xuemin Lin, Jian Xu, Flip Korn, Wei Wang

{"title":"Space-efficient Relative Error Order Sketch over Data Streams","authors":"Ying Zhang, Xuemin Lin, Jian Xu, Flip Korn, Wei Wang","doi":"10.1109/ICDE.2006.145","DOIUrl":null,"url":null,"abstract":"We consider the problem of continuously maintaining order sketches over data streams with a relative rank error guarantee ∊. Novel space-efficient and one-scan randomised techniques are developed. Our first randomised algorithm can guarantee such a relative error precision ∊ with confidence 1 - \\delta using O( 1\\_ \\in \\frac{1} {2}2 log 1d log ∊^2N) space, where N is the number of data elements seen so far in a data stream. Then, a new one-scan space compression technique is developed. Combined with the first randomised algorithm, the one-scan space compression technique yields another one-scan randomised algorithm that guarantees the space requirement is O( 1\\frac{1} { \\in } log(1\\frac{1}{ \\in } log 1\\begin{gathered} \\frac{1}{\\delta } \\hfill \\\\ \\hfill \\\\ \\end{gathered} )\\frac{{\\log ^{2 + \\alpha } \\in N}} {{1 - 1/2^\\alpha }} (for\\alpha \\gt 0) on average while the worst case space remains O( \\frac{1}{{ \\in ^2 }}\\log \\frac{1} {\\delta }\\log \\in ^2 N). These results are immediately applicable to approximately computing quantiles over data streams with a relative error guarantee \\in and significantly improve the previous best space bound O( \\frac{1} {{ \\in ^3 }}\\log \\frac{1}{\\delta }\\log N). Our extensive experiment results demonstrate that both techniques can support an on-line computation against high speed data streams.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"17 1","pages":"51-51"},"PeriodicalIF":0.0000,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"22nd International Conference on Data Engineering (ICDE'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2006.145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 26

Abstract

We consider the problem of continuously maintaining order sketches over data streams with a relative rank error guarantee ∊. Novel space-efficient and one-scan randomised techniques are developed. Our first randomised algorithm can guarantee such a relative error precision ∊ with confidence 1 - \delta using O( 1\_ \in \frac{1} {2}2 log 1d log ∊^2N) space, where N is the number of data elements seen so far in a data stream. Then, a new one-scan space compression technique is developed. Combined with the first randomised algorithm, the one-scan space compression technique yields another one-scan randomised algorithm that guarantees the space requirement is O( 1\frac{1} { \in } log(1\frac{1}{ \in } log 1\begin{gathered} \frac{1}{\delta } \hfill \\ \hfill \\ \end{gathered} )\frac{{\log ^{2 + \alpha } \in N}} {{1 - 1/2^\alpha }} (for\alpha \gt 0) on average while the worst case space remains O( \frac{1}{{ \in ^2 }}\log \frac{1} {\delta }\log \in ^2 N). These results are immediately applicable to approximately computing quantiles over data streams with a relative error guarantee \in and significantly improve the previous best space bound O( \frac{1} {{ \in ^3 }}\log \frac{1}{\delta }\log N). Our extensive experiment results demonstrate that both techniques can support an on-line computation against high speed data streams.

查看原文本刊更多论文

数据流上的空间效率相对错误顺序草图

考虑了具有相对秩误差保证的数据流上连续保持有序草图的问题。开发了新颖的空间高效和一次扫描随机技术。我们的第一个随机化算法可以使用O(1＿ \in\frac{1} 22 log 1d log ^2N)空间保证这种相对误差精度为1 - \delta，其中N是迄今为止在数据流中看到的数据元素的数量。在此基础上，提出了一种新的单扫描空间压缩技术。与第一种随机化算法结合，一次扫描空间压缩技术产生另一种一次扫描随机化算法，该算法保证空间需求为O(1 {}\frac{1}{\in log(1 }\frac{1}{ \in } log 1)\begin{gathered} \frac{1}{\delta } \hfill \\ \hfill \\ \end{gathered} ）\frac{{\log ^{2 + \alpha } \in N}} {{1 - 1/2^\alpha }} (用于\alpha \gt 平均为0)，而最坏情况仍然为0 ( \frac{1}{{ \in ^2 }}\log \frac{1} {\delta }\log \in 这些结果立即适用于具有相对误差保证的数据流上的近似计算分位数 \in 并显著提高了之前的最佳空间界O( \frac{1} {{ \in ＾3 }}\log \frac{1}{\delta }\log 我们广泛的实验结果表明，这两种技术都可以支持对高速数据流的在线计算。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

22nd International Conference on Data Engineering (ICDE'06)

自引率

0.00%

发文量