Ying Zhang, Xuemin Lin, Jian Xu, Flip Korn, Wei Wang
{"title":"Space-efficient Relative Error Order Sketch over Data Streams","authors":"Ying Zhang, Xuemin Lin, Jian Xu, Flip Korn, Wei Wang","doi":"10.1109/ICDE.2006.145","DOIUrl":null,"url":null,"abstract":"We consider the problem of continuously maintaining order sketches over data streams with a relative rank error guarantee ∊. Novel space-efficient and one-scan randomised techniques are developed. Our first randomised algorithm can guarantee such a relative error precision ∊ with confidence 1 - \\delta using O( 1\\_ \\in \\frac{1} {2}2 log 1d log ∊^2N) space, where N is the number of data elements seen so far in a data stream. Then, a new one-scan space compression technique is developed. Combined with the first randomised algorithm, the one-scan space compression technique yields another one-scan randomised algorithm that guarantees the space requirement is O( 1\\frac{1} { \\in } log(1\\frac{1}{ \\in } log 1\\begin{gathered} \\frac{1}{\\delta } \\hfill \\\\ \\hfill \\\\ \\end{gathered} )\\frac{{\\log ^{2 + \\alpha } \\in N}} {{1 - 1/2^\\alpha }} (for\\alpha \\gt 0) on average while the worst case space remains O( \\frac{1}{{ \\in ^2 }}\\log \\frac{1} {\\delta }\\log \\in ^2 N). These results are immediately applicable to approximately computing quantiles over data streams with a relative error guarantee \\in and significantly improve the previous best space bound O( \\frac{1} {{ \\in ^3 }}\\log \\frac{1}{\\delta }\\log N). Our extensive experiment results demonstrate that both techniques can support an on-line computation against high speed data streams.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"17 1","pages":"51-51"},"PeriodicalIF":0.0000,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"22nd International Conference on Data Engineering (ICDE'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2006.145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26
Abstract
We consider the problem of continuously maintaining order sketches over data streams with a relative rank error guarantee ∊. Novel space-efficient and one-scan randomised techniques are developed. Our first randomised algorithm can guarantee such a relative error precision ∊ with confidence 1 - \delta using O( 1\_ \in \frac{1} {2}2 log 1d log ∊^2N) space, where N is the number of data elements seen so far in a data stream. Then, a new one-scan space compression technique is developed. Combined with the first randomised algorithm, the one-scan space compression technique yields another one-scan randomised algorithm that guarantees the space requirement is O( 1\frac{1} { \in } log(1\frac{1}{ \in } log 1\begin{gathered} \frac{1}{\delta } \hfill \\ \hfill \\ \end{gathered} )\frac{{\log ^{2 + \alpha } \in N}} {{1 - 1/2^\alpha }} (for\alpha \gt 0) on average while the worst case space remains O( \frac{1}{{ \in ^2 }}\log \frac{1} {\delta }\log \in ^2 N). These results are immediately applicable to approximately computing quantiles over data streams with a relative error guarantee \in and significantly improve the previous best space bound O( \frac{1} {{ \in ^3 }}\log \frac{1}{\delta }\log N). Our extensive experiment results demonstrate that both techniques can support an on-line computation against high speed data streams.