分布式数据流的动态素描

Guangjun Wu, Siyu Jia, Binbin Li, Shupeng Wang, Xiuguo Bao, Qingsheng Yuan
{"title":"分布式数据流的动态素描","authors":"Guangjun Wu, Siyu Jia, Binbin Li, Shupeng Wang, Xiuguo Bao, Qingsheng Yuan","doi":"10.1109/INFCOMW.2016.7562250","DOIUrl":null,"url":null,"abstract":"Plentiful emerging applications need strict requirement on query response time for different operators over distributed streaming data. As a result, approximate answering approach with accurate sketch has become an important solution to process the fast arrival streams. In this paper, we propose a dynamic sketching framework, which can sample elements from streams with out-of-order data arrival and provide an error-guaranteed estimation schema for many different operators. Within the sketch, we first extract characteristics of uniform sampling and exponential sampling from one-pass streaming data and organize them to support (ξ, δ)-approximation for different operators, such as aggregation operators (e.g., sum, count) and quantile operators (e.g., quantiles, median). Moreover, we construct the sketch in an accuracy lossless and dynamic manner by such operations as sketch splitting and sketch merging without any pori knowledge. The experimental results indicate that when compared to big data analytic systems (Spark, BlinkDB), our approach can achieve 3 times of throughput improvement and 2 orders of magnitude improvement in query response time.","PeriodicalId":348177,"journal":{"name":"2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)","volume":"52 19","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Dynamic sketching over distributed data streams\",\"authors\":\"Guangjun Wu, Siyu Jia, Binbin Li, Shupeng Wang, Xiuguo Bao, Qingsheng Yuan\",\"doi\":\"10.1109/INFCOMW.2016.7562250\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Plentiful emerging applications need strict requirement on query response time for different operators over distributed streaming data. As a result, approximate answering approach with accurate sketch has become an important solution to process the fast arrival streams. In this paper, we propose a dynamic sketching framework, which can sample elements from streams with out-of-order data arrival and provide an error-guaranteed estimation schema for many different operators. Within the sketch, we first extract characteristics of uniform sampling and exponential sampling from one-pass streaming data and organize them to support (ξ, δ)-approximation for different operators, such as aggregation operators (e.g., sum, count) and quantile operators (e.g., quantiles, median). Moreover, we construct the sketch in an accuracy lossless and dynamic manner by such operations as sketch splitting and sketch merging without any pori knowledge. The experimental results indicate that when compared to big data analytic systems (Spark, BlinkDB), our approach can achieve 3 times of throughput improvement and 2 orders of magnitude improvement in query response time.\",\"PeriodicalId\":348177,\"journal\":{\"name\":\"2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)\",\"volume\":\"52 19\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INFCOMW.2016.7562250\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFCOMW.2016.7562250","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

大量新兴应用对分布式流数据上不同运营商的查询响应时间有严格的要求。因此,具有精确草图的近似回答方法已成为处理快速到达流的重要解决方案。在本文中,我们提出了一个动态草图框架,它可以从无序数据到达的流中采样元素,并为许多不同的操作提供了一个保证错误的估计模式。在草图中,我们首先从一次通过的流数据中提取均匀抽样和指数抽样的特征,并将它们组织起来,以支持不同算子的(ξ, δ)逼近,例如聚合算子(例如sum, count)和分位数算子(例如分位数,中位数)。此外,我们在不需要任何孔隙知识的情况下,通过草图分割和草图合并等操作,以精确无损和动态的方式构建草图。实验结果表明,与大数据分析系统(Spark, BlinkDB)相比,我们的方法可以实现3倍的吞吐量提升和2个数量级的查询响应时间提升。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Dynamic sketching over distributed data streams
Plentiful emerging applications need strict requirement on query response time for different operators over distributed streaming data. As a result, approximate answering approach with accurate sketch has become an important solution to process the fast arrival streams. In this paper, we propose a dynamic sketching framework, which can sample elements from streams with out-of-order data arrival and provide an error-guaranteed estimation schema for many different operators. Within the sketch, we first extract characteristics of uniform sampling and exponential sampling from one-pass streaming data and organize them to support (ξ, δ)-approximation for different operators, such as aggregation operators (e.g., sum, count) and quantile operators (e.g., quantiles, median). Moreover, we construct the sketch in an accuracy lossless and dynamic manner by such operations as sketch splitting and sketch merging without any pori knowledge. The experimental results indicate that when compared to big data analytic systems (Spark, BlinkDB), our approach can achieve 3 times of throughput improvement and 2 orders of magnitude improvement in query response time.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信