Fixed-precision approximate continuous aggregate queries in peer-to-peer databases

F. Kashani, C. Shahabi
{"title":"Fixed-precision approximate continuous aggregate queries in peer-to-peer databases","authors":"F. Kashani, C. Shahabi","doi":"10.4108/ICST.COLLABORATECOM.2010.2","DOIUrl":null,"url":null,"abstract":"In this paper, we propose an efficient sample-based approach to answer fixed-precision approximate continuous aggregate queries in peer-to-peer databases. First, we define practical semantics to formulate fixed-precision approximate continuous aggregate queries. Second, we propose “Digest”, a two-tier system for correct and efficient query answering by sampling. At the top tier, we develop a query evaluation engine that uses the samples collected from the peer-to-peer database to continually estimate the running result of the approximate continuous aggregate query with guaranteed precision. For efficient query evaluation, we propose an extrapolation algorithm that predicts the evolution of the running result and adapts the frequency of the continual sampling occasions accordingly to avoid redundant samples. We also introduce a repeated sampling algorithm that draws on the correlation between the samples at successive sampling occasions and exploits linear regression to minimize the number of the samples derived at each occasion. At the bottom tier, we introduce a distributed sampling algorithm for random sampling (uniform and nonuniform) from peer-to-peer databases with arbitrary network topology and tuple distribution. Our sampling algorithm is based on the Metropolis Markov Chain Monte Carlo method that guarantees randomness of the sample with arbitrary small variation difference with the desired distribution, while it is comparable to optimal sampling in sampling cost/time. We evaluate the efficiency of Digest via simulation using real data.","PeriodicalId":354101,"journal":{"name":"6th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2010)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"6th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2010)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4108/ICST.COLLABORATECOM.2010.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

In this paper, we propose an efficient sample-based approach to answer fixed-precision approximate continuous aggregate queries in peer-to-peer databases. First, we define practical semantics to formulate fixed-precision approximate continuous aggregate queries. Second, we propose “Digest”, a two-tier system for correct and efficient query answering by sampling. At the top tier, we develop a query evaluation engine that uses the samples collected from the peer-to-peer database to continually estimate the running result of the approximate continuous aggregate query with guaranteed precision. For efficient query evaluation, we propose an extrapolation algorithm that predicts the evolution of the running result and adapts the frequency of the continual sampling occasions accordingly to avoid redundant samples. We also introduce a repeated sampling algorithm that draws on the correlation between the samples at successive sampling occasions and exploits linear regression to minimize the number of the samples derived at each occasion. At the bottom tier, we introduce a distributed sampling algorithm for random sampling (uniform and nonuniform) from peer-to-peer databases with arbitrary network topology and tuple distribution. Our sampling algorithm is based on the Metropolis Markov Chain Monte Carlo method that guarantees randomness of the sample with arbitrary small variation difference with the desired distribution, while it is comparable to optimal sampling in sampling cost/time. We evaluate the efficiency of Digest via simulation using real data.
点对点数据库中固定精度近似连续聚合查询
在本文中,我们提出了一种有效的基于样本的方法来回答点对点数据库中固定精度的近似连续聚合查询。首先,我们定义了实用的语义,以形成固定精度的近似连续聚合查询。其次,我们提出了“文摘”(Digest),这是一个两层的系统,通过抽样来实现正确和高效的查询回答。在顶层,我们开发了一个查询评估引擎,该引擎使用从点对点数据库收集的样本,在保证精度的情况下持续估计近似连续聚合查询的运行结果。为了有效地评估查询,我们提出了一种外推算法,该算法可以预测运行结果的演变,并相应地调整连续采样场合的频率,以避免冗余样本。我们还介绍了一种重复采样算法,该算法利用连续采样场合中样本之间的相关性,并利用线性回归来最小化每次导出的样本数量。在底层,我们引入了一种分布式采样算法,用于从具有任意网络拓扑和元组分布的对等数据库中随机采样(均匀和非均匀)。我们的抽样算法基于Metropolis马尔可夫链蒙特卡罗方法,保证了样本与期望分布的任意小差异的随机性,同时在抽样成本/时间上与最优抽样相当。通过对实际数据的仿真,对Digest的效率进行了评价。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信