Distributed execution of continuous queries

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI:10.1109/ICDE.2014.6816767

Rajeev Gupta, K. Ramamritham

{"title":"Distributed execution of continuous queries","authors":"Rajeev Gupta, K. Ramamritham","doi":"10.1109/ICDE.2014.6816767","DOIUrl":null,"url":null,"abstract":"Data delivered over the internet is increasingly being used for providing dynamic and personalized user experiences. To achieve this, queries are executed over fast changing data from distributed sources. As these queries require data from multiple sources, these queries are executed at an intermediate proxy or data aggregator. Typically, users of these queries are not interested in all the data updates. Query results may be associated with an imprecision bound or threshold which can be used to limit the number of refresh messages. These queries can be categorized based on the types of results required: in an entity based query the user is just interested in knowing the ids of the data items (or entities) satisfying certain selection condition; in a value based query the user is interested in the value of some aggregation over distributed data items; and in a threshold query the user wants to know whether a Boolean condition, expressed as a threshold over an aggregation of data items, is true. We methodically present techniques for executing all these categories of continuous aggregation queries over distributed data so that the number of message exchanges between data sources, aggregators, and users is minimized. The value of individual data items can be uncertain with an associated probability. A data aggregator can execute the query either by getting all the required data or by sending appropriate sub-queries to the distributed data sources. For getting the data, the aggregator can use either push or pull based mechanisms. Each of these methods has different ways of minimizing the number of message exchanges. We present various algorithms for the same.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 30th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2014.6816767","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Data delivered over the internet is increasingly being used for providing dynamic and personalized user experiences. To achieve this, queries are executed over fast changing data from distributed sources. As these queries require data from multiple sources, these queries are executed at an intermediate proxy or data aggregator. Typically, users of these queries are not interested in all the data updates. Query results may be associated with an imprecision bound or threshold which can be used to limit the number of refresh messages. These queries can be categorized based on the types of results required: in an entity based query the user is just interested in knowing the ids of the data items (or entities) satisfying certain selection condition; in a value based query the user is interested in the value of some aggregation over distributed data items; and in a threshold query the user wants to know whether a Boolean condition, expressed as a threshold over an aggregation of data items, is true. We methodically present techniques for executing all these categories of continuous aggregation queries over distributed data so that the number of message exchanges between data sources, aggregators, and users is minimized. The value of individual data items can be uncertain with an associated probability. A data aggregator can execute the query either by getting all the required data or by sending appropriate sub-queries to the distributed data sources. For getting the data, the aggregator can use either push or pull based mechanisms. Each of these methods has different ways of minimizing the number of message exchanges. We present various algorithms for the same.

查看原文本刊更多论文

连续查询的分布式执行

通过互联网传输的数据越来越多地被用于提供动态和个性化的用户体验。为了实现这一点，查询是对来自分布式数据源的快速变化的数据执行的。由于这些查询需要来自多个数据源的数据，因此这些查询在中间代理或数据聚合器上执行。通常，这些查询的用户对所有的数据更新都不感兴趣。查询结果可能与可用于限制刷新消息数量的不精确绑定或阈值相关联。这些查询可以根据所需结果的类型进行分类:在基于实体的查询中，用户只对知道满足某些选择条件的数据项(或实体)的id感兴趣;在基于值的查询中，用户感兴趣的是分布式数据项上某些聚合的值;而在阈值查询中，用户想要知道布尔条件(表示为数据项聚合的阈值)是否为真。我们系统地介绍了在分布式数据上执行所有这些类别的连续聚合查询的技术，以便最大限度地减少数据源、聚合器和用户之间的消息交换数量。单个数据项的值可能具有相关概率的不确定性。数据聚合器可以通过获取所有所需数据或向分布式数据源发送适当的子查询来执行查询。为了获取数据，聚合器可以使用基于推或拉的机制。每种方法都有不同的方法来最小化消息交换的数量。我们提出了不同的算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE 30th International Conference on Data Engineering

自引率

0.00%

发文量