{"title":"持续维护数据流中最近N个元素的分位数摘要","authors":"Xuemin Lin, Hongjun Lu, Jian Xu, J. Yu","doi":"10.1109/ICDE.2004.1320011","DOIUrl":null,"url":null,"abstract":"Statistics over the most recently observed data elements are often required in applications involving data streams, such as intrusion detection in network monitoring, stock price prediction in financial markets, Web log mining for access prediction, and user click stream mining for personalization. Among various statistics, computing quantile summary is probably most challenging because of its complexity. We study the problem of continuously maintaining quantile summary of the most recently observed N elements over a stream so that quantile queries can be answered with a guaranteed precision of /spl epsiv/N. We developed a space efficient algorithm for predefined N that requires only one scan of the input data stream and O(log(/spl epsiv//sup 2/N)//spl epsiv/+1//spl epsiv//sup 2/) space in the worst cases. We also developed an algorithm that maintains quantile summaries for most recent N elements so that quantile queries on any most recent n elements (n /spl les/ N) can be answered with a guaranteed precision of /spl epsiv/n. The worst case space requirement for this algorithm is only O(log/sup 2/(/spl epsiv/N)//spl epsiv//sup 2/). Our performance study indicated that not only the actual quantile estimation error is far below the guaranteed precision but the space requirement is also much less than the given theoretical bound.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"358 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"91","resultStr":"{\"title\":\"Continuously maintaining quantile summaries of the most recent N elements over a data stream\",\"authors\":\"Xuemin Lin, Hongjun Lu, Jian Xu, J. Yu\",\"doi\":\"10.1109/ICDE.2004.1320011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Statistics over the most recently observed data elements are often required in applications involving data streams, such as intrusion detection in network monitoring, stock price prediction in financial markets, Web log mining for access prediction, and user click stream mining for personalization. Among various statistics, computing quantile summary is probably most challenging because of its complexity. We study the problem of continuously maintaining quantile summary of the most recently observed N elements over a stream so that quantile queries can be answered with a guaranteed precision of /spl epsiv/N. We developed a space efficient algorithm for predefined N that requires only one scan of the input data stream and O(log(/spl epsiv//sup 2/N)//spl epsiv/+1//spl epsiv//sup 2/) space in the worst cases. We also developed an algorithm that maintains quantile summaries for most recent N elements so that quantile queries on any most recent n elements (n /spl les/ N) can be answered with a guaranteed precision of /spl epsiv/n. The worst case space requirement for this algorithm is only O(log/sup 2/(/spl epsiv/N)//spl epsiv//sup 2/). Our performance study indicated that not only the actual quantile estimation error is far below the guaranteed precision but the space requirement is also much less than the given theoretical bound.\",\"PeriodicalId\":358862,\"journal\":{\"name\":\"Proceedings. 20th International Conference on Data Engineering\",\"volume\":\"358 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"91\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. 20th International Conference on Data Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.2004.1320011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 20th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2004.1320011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Continuously maintaining quantile summaries of the most recent N elements over a data stream
Statistics over the most recently observed data elements are often required in applications involving data streams, such as intrusion detection in network monitoring, stock price prediction in financial markets, Web log mining for access prediction, and user click stream mining for personalization. Among various statistics, computing quantile summary is probably most challenging because of its complexity. We study the problem of continuously maintaining quantile summary of the most recently observed N elements over a stream so that quantile queries can be answered with a guaranteed precision of /spl epsiv/N. We developed a space efficient algorithm for predefined N that requires only one scan of the input data stream and O(log(/spl epsiv//sup 2/N)//spl epsiv/+1//spl epsiv//sup 2/) space in the worst cases. We also developed an algorithm that maintains quantile summaries for most recent N elements so that quantile queries on any most recent n elements (n /spl les/ N) can be answered with a guaranteed precision of /spl epsiv/n. The worst case space requirement for this algorithm is only O(log/sup 2/(/spl epsiv/N)//spl epsiv//sup 2/). Our performance study indicated that not only the actual quantile estimation error is far below the guaranteed precision but the space requirement is also much less than the given theoretical bound.