Ying Zhang, Xuemin Lin, Jian Xu, Flip Korn, Wei Wang
{"title":"数据流上的空间效率相对错误顺序草图","authors":"Ying Zhang, Xuemin Lin, Jian Xu, Flip Korn, Wei Wang","doi":"10.1109/ICDE.2006.145","DOIUrl":null,"url":null,"abstract":"We consider the problem of continuously maintaining order sketches over data streams with a relative rank error guarantee ∊. Novel space-efficient and one-scan randomised techniques are developed. Our first randomised algorithm can guarantee such a relative error precision ∊ with confidence 1 - \\delta using O( 1\\_ \\in \\frac{1} {2}2 log 1d log ∊^2N) space, where N is the number of data elements seen so far in a data stream. Then, a new one-scan space compression technique is developed. Combined with the first randomised algorithm, the one-scan space compression technique yields another one-scan randomised algorithm that guarantees the space requirement is O( 1\\frac{1} { \\in } log(1\\frac{1}{ \\in } log 1\\begin{gathered} \\frac{1}{\\delta } \\hfill \\\\ \\hfill \\\\ \\end{gathered} )\\frac{{\\log ^{2 + \\alpha } \\in N}} {{1 - 1/2^\\alpha }} (for\\alpha \\gt 0) on average while the worst case space remains O( \\frac{1}{{ \\in ^2 }}\\log \\frac{1} {\\delta }\\log \\in ^2 N). These results are immediately applicable to approximately computing quantiles over data streams with a relative error guarantee \\in and significantly improve the previous best space bound O( \\frac{1} {{ \\in ^3 }}\\log \\frac{1}{\\delta }\\log N). Our extensive experiment results demonstrate that both techniques can support an on-line computation against high speed data streams.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"17 1","pages":"51-51"},"PeriodicalIF":0.0000,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":"{\"title\":\"Space-efficient Relative Error Order Sketch over Data Streams\",\"authors\":\"Ying Zhang, Xuemin Lin, Jian Xu, Flip Korn, Wei Wang\",\"doi\":\"10.1109/ICDE.2006.145\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider the problem of continuously maintaining order sketches over data streams with a relative rank error guarantee ∊. Novel space-efficient and one-scan randomised techniques are developed. Our first randomised algorithm can guarantee such a relative error precision ∊ with confidence 1 - \\\\delta using O( 1\\\\_ \\\\in \\\\frac{1} {2}2 log 1d log ∊^2N) space, where N is the number of data elements seen so far in a data stream. Then, a new one-scan space compression technique is developed. Combined with the first randomised algorithm, the one-scan space compression technique yields another one-scan randomised algorithm that guarantees the space requirement is O( 1\\\\frac{1} { \\\\in } log(1\\\\frac{1}{ \\\\in } log 1\\\\begin{gathered} \\\\frac{1}{\\\\delta } \\\\hfill \\\\\\\\ \\\\hfill \\\\\\\\ \\\\end{gathered} )\\\\frac{{\\\\log ^{2 + \\\\alpha } \\\\in N}} {{1 - 1/2^\\\\alpha }} (for\\\\alpha \\\\gt 0) on average while the worst case space remains O( \\\\frac{1}{{ \\\\in ^2 }}\\\\log \\\\frac{1} {\\\\delta }\\\\log \\\\in ^2 N). These results are immediately applicable to approximately computing quantiles over data streams with a relative error guarantee \\\\in and significantly improve the previous best space bound O( \\\\frac{1} {{ \\\\in ^3 }}\\\\log \\\\frac{1}{\\\\delta }\\\\log N). Our extensive experiment results demonstrate that both techniques can support an on-line computation against high speed data streams.\",\"PeriodicalId\":6819,\"journal\":{\"name\":\"22nd International Conference on Data Engineering (ICDE'06)\",\"volume\":\"17 1\",\"pages\":\"51-51\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"26\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"22nd International Conference on Data Engineering (ICDE'06)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.2006.145\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"22nd International Conference on Data Engineering (ICDE'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2006.145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Space-efficient Relative Error Order Sketch over Data Streams
We consider the problem of continuously maintaining order sketches over data streams with a relative rank error guarantee ∊. Novel space-efficient and one-scan randomised techniques are developed. Our first randomised algorithm can guarantee such a relative error precision ∊ with confidence 1 - \delta using O( 1\_ \in \frac{1} {2}2 log 1d log ∊^2N) space, where N is the number of data elements seen so far in a data stream. Then, a new one-scan space compression technique is developed. Combined with the first randomised algorithm, the one-scan space compression technique yields another one-scan randomised algorithm that guarantees the space requirement is O( 1\frac{1} { \in } log(1\frac{1}{ \in } log 1\begin{gathered} \frac{1}{\delta } \hfill \\ \hfill \\ \end{gathered} )\frac{{\log ^{2 + \alpha } \in N}} {{1 - 1/2^\alpha }} (for\alpha \gt 0) on average while the worst case space remains O( \frac{1}{{ \in ^2 }}\log \frac{1} {\delta }\log \in ^2 N). These results are immediately applicable to approximately computing quantiles over data streams with a relative error guarantee \in and significantly improve the previous best space bound O( \frac{1} {{ \in ^3 }}\log \frac{1}{\delta }\log N). Our extensive experiment results demonstrate that both techniques can support an on-line computation against high speed data streams.