高速数据流中近似分位数的快速算法

19th International Conference on Scientific and Statistical Database Management (SSDBM 2007) Pub Date : 2007-07-09 DOI:10.1109/SSDBM.2007.27

Qi Zhang, Wei Wang

{"title":"高速数据流中近似分位数的快速算法","authors":"Qi Zhang, Wei Wang","doi":"10.1109/SSDBM.2007.27","DOIUrl":null,"url":null,"abstract":"We present a fast algorithm for computing approximate quantiles in high speed data streams with deterministic error bounds. For data streams of size N where N is unknown in advance, our algorithm partitions the stream into sub-streams of exponentially increasing size as they arrive. For each sub-stream which has a fixed size, we compute and maintain a multi-level summary structure using a novel algorithm. In order to achieve high speed performance, the algorithm uses simple block-wise merge and sample operations. Overall, our algorithms for fixed-size streams and arbitrary-size streams have a computational cost of O(N log(1/epsivlogepsivN)) and an average per-element update cost of O(log logN) if epsiv is fixed.","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":"{\"title\":\"A Fast Algorithm for Approximate Quantiles in High Speed Data Streams\",\"authors\":\"Qi Zhang, Wei Wang\",\"doi\":\"10.1109/SSDBM.2007.27\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a fast algorithm for computing approximate quantiles in high speed data streams with deterministic error bounds. For data streams of size N where N is unknown in advance, our algorithm partitions the stream into sub-streams of exponentially increasing size as they arrive. For each sub-stream which has a fixed size, we compute and maintain a multi-level summary structure using a novel algorithm. In order to achieve high speed performance, the algorithm uses simple block-wise merge and sample operations. Overall, our algorithms for fixed-size streams and arbitrary-size streams have a computational cost of O(N log(1/epsivlogepsivN)) and an average per-element update cost of O(log logN) if epsiv is fixed.\",\"PeriodicalId\":122925,\"journal\":{\"name\":\"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"40\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SSDBM.2007.27\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSDBM.2007.27","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 40

摘要

提出了一种在具有确定性误差边界的高速数据流中计算近似分位数的快速算法。对于大小为N且N事先未知的数据流，我们的算法将流划分为子流，子流的大小随着到达而呈指数级增长。对于每个固定大小的子流，我们使用一种新颖的算法计算和维护一个多级汇总结构。为了实现高速性能，该算法采用简单的分块合并和采样操作。总的来说，对于固定大小的流和任意大小的流，我们的算法的计算成本为O(N log(1/epsivlogepsivN))，如果epsiv是固定的，则每个元素的平均更新成本为O(log logN)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Fast Algorithm for Approximate Quantiles in High Speed Data Streams

We present a fast algorithm for computing approximate quantiles in high speed data streams with deterministic error bounds. For data streams of size N where N is unknown in advance, our algorithm partitions the stream into sub-streams of exponentially increasing size as they arrive. For each sub-stream which has a fixed size, we compute and maintain a multi-level summary structure using a novel algorithm. In order to achieve high speed performance, the algorithm uses simple block-wise merge and sample operations. Overall, our algorithms for fixed-size streams and arbitrary-size streams have a computational cost of O(N log(1/epsivlogepsivN)) and an average per-element update cost of O(log logN) if epsiv is fixed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)

自引率

0.00%

发文量