Proceedings 18th International Conference on Data Engineering最新文献_第8页

Design and evaluation of alternative selection placement strategies in optimizing continuous queries 优化连续查询的备选选择放置策略的设计和评估

Proceedings 18th International Conference on Data Engineering Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994749

Jianjun Chen, D. DeWitt, J. Naughton

{"title":"Design and evaluation of alternative selection placement strategies in optimizing continuous queries","authors":"Jianjun Chen, D. DeWitt, J. Naughton","doi":"10.1109/ICDE.2002.994749","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994749","url":null,"abstract":"We design and evaluate alternative selection placement strategies for optimizing a very large number of continuous queries in an Internet environment. Two grouping strategies, PushDown and PullUp, in which selections are either pushed below, or pulled above, joins are proposed and investigated. While our earlier research has demonstrated that the incremental group optimization can significantly outperform an ungrouped approach, the results from the paper show that different incremental group optimization strategies can have significantly different performance characteristics. Surprisingly, in our studies, PullUp, in which selections are pulled above joins, is often better and achieves an average 10 fold performance improvement over PushDown (occasionally 100 times faster). Furthermore, a revised algorithm of PullUp, termed filtered PullUp is proposed that is able to further reduce the cost of PullUp by 75% when the union of the selection predicates is selective. Detailed cost models, which consider several special parameters, including (1) characteristics of queries to be grouped, and (2) characteristics of data changes, are presented. Preliminary experiments using an implementation of both strategies show that our models are fairly accurate in predicting the results obtained from the implementation of these techniques in the Niagara system.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127204877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 114

Efficient evaluation of queries with mining predicates 使用挖掘谓词对查询进行有效评估

Proceedings 18th International Conference on Data Engineering Pub Date : 2002-06-03 DOI: 10.1109/ICDE.2002.994772

S. Chaudhuri, Vivek R. Narasayya, Sunita Sarawagi

引用次数: 40

Proceedings 18th International Conference on Data Engineering Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994702

S. Melnik, H. Garcia-Molina, E. Rahm

引用次数: 1641

Evaluating top-k queries over Web-accessible databases 评估web可访问数据库上的top-k查询

Proceedings 18th International Conference on Data Engineering Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994751

Nicolas Bruno, L. Gravano, A. Marian

{"title":"Evaluating top-k queries over Web-accessible databases","authors":"Nicolas Bruno, L. Gravano, A. Marian","doi":"10.1109/ICDE.2002.994751","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994751","url":null,"abstract":"A query to a Web search engine usually consists of a list of keywords, to which the search engine responds with the best or \"top\" k pages for the query. This top-k query model is prevalent over multimedia collections in general, but also over plain relational data for certain applications. For example, consider a relation with information on available restaurants, including their location, price range for one diner, and overall food rating. A user who queries such a relation might simply specify the user's location and target price range, and expect in return the best 10 restaurants in terms of some combination-of proximity to the user, closeness of match to the target price range, and overall food rating. Processing such top-k queries efficiently is challenging for a number of reasons. One critical such reason is that, in many Web applications, the relation attributes might not be available other than through external Web-accessible form interfaces, which we will have to query repeatedly for a potentially large set of candidate objects. In this paper, we study how to process top-k queries efficiently in this setting, where the attributes for which users specify target values might be handled by external, autonomous sources with a variety of access interfaces. We present several algorithms for processing such queries, and evaluate them thoroughly using both synthetic and real Web-accessible data.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126063201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 559

Sequenced subset operators: definition and implementation 序列子集操作符:定义和实现

Proceedings 18th International Conference on Data Engineering Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994699

Joseph Dunn, S. Davey, A. Descour, R. Snodgrass

{"title":"Sequenced subset operators: definition and implementation","authors":"Joseph Dunn, S. Davey, A. Descour, R. Snodgrass","doi":"10.1109/ICDE.2002.994699","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994699","url":null,"abstract":"Difference, intersection, semi join and anti-semi-join may be considered binary subset operators, in that they all return a subset of their left-hand argument. These operators are useful for implementing SQL's EXCEPT, INTERSECT, NOT IN and NOT EXISTS, distributed queries and referential integrity. Difference-all and intersection-all operate on multi-sets and track the number of duplicates in both argument relations; they are used to implement SQL's EXCEPT ALL and INTERSECT ALL. Their temporally sequenced analogues, which effectively apply the subset operator at each point in time, are needed for implementing these constructs in temporal databases. These SQL expressions are complex; most necessitate at least a three-way join, with nested NOT EXISTS clauses. We consider how to implement these operators directly in a DBMS. These operators are interesting in that they can fragment the left-hand validity periods (sequenced difference-all also fragments the right-hand periods) and thus introduce memory complications found neither in their non-temporal counterparts nor in temporal joins and semijoins. We introduce novel algorithms for implementing these operators by ordering the computation so that fragments need not be retained in main memory. We evaluate these algorithms and demonstrate that they are no more expensive than a single conventional join.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132946532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

StreamCorder: fast trial-and-error analysis in scientific databases StreamCorder:在科学数据库中快速试错分析

Proceedings 18th International Conference on Data Engineering Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994769

E. Stolte, G. Alonso

引用次数: 0

Managing complex and varied data with the IndexFabric/sup TM/ 使用IndexFabric/sup TM/管理复杂多变的数据

Proceedings 18th International Conference on Data Engineering Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994765

N. Sample, Brian F. Cooper, M. Franklin, Gísli R. Hjaltason, Moshe Shadmon, Levy Cohe

引用次数: 6

Efficient indexing structures for mining frequent patterns 用于挖掘频繁模式的高效索引结构

Proceedings 18th International Conference on Data Engineering Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994758

Bin Lan, B. Ooi, K. Tan

引用次数: 16

An efficient index structure for shift and scale invariant search of mufti-attribute time sequences 一种有效的多属性时间序列移位和尺度不变搜索索引结构

Proceedings 18th International Conference on Data Engineering Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994720

Tamer Kahveci, Ambuj K. Singh, Aliekber Gürel

引用次数: 6

Detecting changes in XML documents 检测XML文档中的更改

Proceedings 18th International Conference on Data Engineering Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994696

G. Cobena, S. Abiteboul, A. Marian

{"title":"Detecting changes in XML documents","authors":"G. Cobena, S. Abiteboul, A. Marian","doi":"10.1109/ICDE.2002.994696","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994696","url":null,"abstract":"We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating dynamic warehouses capable of storing massive volumes of XML data. Because of the context, our algorithm has to be very efficient in terms of speed and memory space even at the cost of some loss of quality. Also, it considers, besides insertions, deletions and updates (standard in diffs), a move operation on subtrees that is essential in the context of XML. Intuitively, our diff algorithm uses signatures to match (large) subtrees that were left unchanged between the old and new versions. Such exact matchings are then possibly propagated to ancestors and descendants to obtain more matchings. It also uses XML specific information such as ID attributes. We provide a performance analysis of the algorithm. We show that it runs in average in linear time vs. quadratic time for previous algorithms. We present experiments on synthetic data that confirm the analysis. Since this problem is NP-hard, the linear time is obtained by trading some quality. We present experiments (again on synthetic data) that show that the output of our algorithm is reasonably close to the optimal in terms of quality. Finally we present experiments on a small sample of XML pages found on the Web.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122790654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 533