Xiaobo Zhu, Guangjun Wu, Hong Zhang, Shupeng Wang, Bingnan Ma
{"title":"连续数据流分析查询的动态计数最小草图","authors":"Xiaobo Zhu, Guangjun Wu, Hong Zhang, Shupeng Wang, Bingnan Ma","doi":"10.1109/HiPC.2018.00033","DOIUrl":null,"url":null,"abstract":"The methods of approximate query processing have been proposed for analytics over high-speed data streams, which compact continuous streams into a space-constrained sketch and provide reliable estimates for different queries. Count-Min (CM) is the state-of-the-art sketching structure supporting many queries with error-guaranteed estimates under limited space. However, we need to create a counter table beforehand in CM according to the size of data streams, while it is usually unpredictable for dynamic data streams. In this paper, we proposed an approach, called Dynamic Count-Min sketch (DCM), which is appropriate for dynamic data set and can provide accurate estimates for point query and self-join size query. Our approach constitutes incremental CM sketches and allocates space in a pay-as-you-go manner. Our mathematical analysis and substantial experiments both show that our approach is appropriate for data sets with dynamic or skewed inputs and can provide error-guaranteed estimates with less space compared to CM.","PeriodicalId":113335,"journal":{"name":"2018 IEEE 25th International Conference on High Performance Computing (HiPC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Dynamic Count-Min Sketch for Analytical Queries Over Continuous Data Streams\",\"authors\":\"Xiaobo Zhu, Guangjun Wu, Hong Zhang, Shupeng Wang, Bingnan Ma\",\"doi\":\"10.1109/HiPC.2018.00033\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The methods of approximate query processing have been proposed for analytics over high-speed data streams, which compact continuous streams into a space-constrained sketch and provide reliable estimates for different queries. Count-Min (CM) is the state-of-the-art sketching structure supporting many queries with error-guaranteed estimates under limited space. However, we need to create a counter table beforehand in CM according to the size of data streams, while it is usually unpredictable for dynamic data streams. In this paper, we proposed an approach, called Dynamic Count-Min sketch (DCM), which is appropriate for dynamic data set and can provide accurate estimates for point query and self-join size query. Our approach constitutes incremental CM sketches and allocates space in a pay-as-you-go manner. Our mathematical analysis and substantial experiments both show that our approach is appropriate for data sets with dynamic or skewed inputs and can provide error-guaranteed estimates with less space compared to CM.\",\"PeriodicalId\":113335,\"journal\":{\"name\":\"2018 IEEE 25th International Conference on High Performance Computing (HiPC)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 25th International Conference on High Performance Computing (HiPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HiPC.2018.00033\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 25th International Conference on High Performance Computing (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC.2018.00033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dynamic Count-Min Sketch for Analytical Queries Over Continuous Data Streams
The methods of approximate query processing have been proposed for analytics over high-speed data streams, which compact continuous streams into a space-constrained sketch and provide reliable estimates for different queries. Count-Min (CM) is the state-of-the-art sketching structure supporting many queries with error-guaranteed estimates under limited space. However, we need to create a counter table beforehand in CM according to the size of data streams, while it is usually unpredictable for dynamic data streams. In this paper, we proposed an approach, called Dynamic Count-Min sketch (DCM), which is appropriate for dynamic data set and can provide accurate estimates for point query and self-join size query. Our approach constitutes incremental CM sketches and allocates space in a pay-as-you-go manner. Our mathematical analysis and substantial experiments both show that our approach is appropriate for data sets with dynamic or skewed inputs and can provide error-guaranteed estimates with less space compared to CM.