Proceedings 18th International Conference on Data Engineering最新文献

筛选
英文 中文
Design and evaluation of alternative selection placement strategies in optimizing continuous queries 优化连续查询的备选选择放置策略的设计和评估
Proceedings 18th International Conference on Data Engineering Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994749
Jianjun Chen, D. DeWitt, J. Naughton
{"title":"Design and evaluation of alternative selection placement strategies in optimizing continuous queries","authors":"Jianjun Chen, D. DeWitt, J. Naughton","doi":"10.1109/ICDE.2002.994749","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994749","url":null,"abstract":"We design and evaluate alternative selection placement strategies for optimizing a very large number of continuous queries in an Internet environment. Two grouping strategies, PushDown and PullUp, in which selections are either pushed below, or pulled above, joins are proposed and investigated. While our earlier research has demonstrated that the incremental group optimization can significantly outperform an ungrouped approach, the results from the paper show that different incremental group optimization strategies can have significantly different performance characteristics. Surprisingly, in our studies, PullUp, in which selections are pulled above joins, is often better and achieves an average 10 fold performance improvement over PushDown (occasionally 100 times faster). Furthermore, a revised algorithm of PullUp, termed filtered PullUp is proposed that is able to further reduce the cost of PullUp by 75% when the union of the selection predicates is selective. Detailed cost models, which consider several special parameters, including (1) characteristics of queries to be grouped, and (2) characteristics of data changes, are presented. Preliminary experiments using an implementation of both strategies show that our models are fairly accurate in predicting the results obtained from the implementation of these techniques in the Niagara system.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127204877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 114
Efficient evaluation of queries with mining predicates 使用挖掘谓词对查询进行有效评估
Proceedings 18th International Conference on Data Engineering Pub Date : 2002-06-03 DOI: 10.1109/ICDE.2002.994772
S. Chaudhuri, Vivek R. Narasayya, Sunita Sarawagi
{"title":"Efficient evaluation of queries with mining predicates","authors":"S. Chaudhuri, Vivek R. Narasayya, Sunita Sarawagi","doi":"10.1109/ICDE.2002.994772","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994772","url":null,"abstract":"Modern relational database systems are beginning to support ad-hoc queries on data mining models. In this paper, we explore novel techniques for optimizing queries that apply mining models to relational data. For such queries, we use the internal structure of the mining model to automatically derive traditional database predicates. We present algorithms for deriving such predicates for some popular discrete mining models: decision trees, naive Bayes, and clustering. Our experiments on a Microsoft SQL Server 2000 demonstrate that these derived predicates can significantly reduce the cost of evaluating such queries.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"26 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114023941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Similarity flooding: a versatile graph matching algorithm and its application to schema matching 相似泛洪:一种通用的图匹配算法及其在模式匹配中的应用
Proceedings 18th International Conference on Data Engineering Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994702
S. Melnik, H. Garcia-Molina, E. Rahm
{"title":"Similarity flooding: a versatile graph matching algorithm and its application to schema matching","authors":"S. Melnik, H. Garcia-Molina, E. Rahm","doi":"10.1109/ICDE.2002.994702","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994702","url":null,"abstract":"Matching elements of two data schemas or two data instances plays a key role in data warehousing, e-business, or even biochemical applications. In this paper we present a matching algorithm based on a fixpoint computation that is usable across different scenarios. The algorithm takes two graphs (schemas, catalogs, or other data structures) as input, and produces as output a mapping between corresponding nodes of the graphs. Depending on the matching goal, a subset of the mapping is chosen using filters. After our algorithm runs, we expect a human to check and if necessary adjust the results. As a matter of fact, we evaluate the 'accuracy' of the algorithm by counting the number of needed adjustments. We conducted a user study, in which our accuracy metric was used to estimate the labor savings that the users could obtain by utilizing our algorithm to obtain an initial matching. Finally, we illustrate how our matching algorithm is deployed as one of several high-level operators in an implemented testbed for managing information models and mappings.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122927054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1641
Evaluating top-k queries over Web-accessible databases 评估web可访问数据库上的top-k查询
Proceedings 18th International Conference on Data Engineering Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994751
Nicolas Bruno, L. Gravano, A. Marian
{"title":"Evaluating top-k queries over Web-accessible databases","authors":"Nicolas Bruno, L. Gravano, A. Marian","doi":"10.1109/ICDE.2002.994751","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994751","url":null,"abstract":"A query to a Web search engine usually consists of a list of keywords, to which the search engine responds with the best or \"top\" k pages for the query. This top-k query model is prevalent over multimedia collections in general, but also over plain relational data for certain applications. For example, consider a relation with information on available restaurants, including their location, price range for one diner, and overall food rating. A user who queries such a relation might simply specify the user's location and target price range, and expect in return the best 10 restaurants in terms of some combination-of proximity to the user, closeness of match to the target price range, and overall food rating. Processing such top-k queries efficiently is challenging for a number of reasons. One critical such reason is that, in many Web applications, the relation attributes might not be available other than through external Web-accessible form interfaces, which we will have to query repeatedly for a potentially large set of candidate objects. In this paper, we study how to process top-k queries efficiently in this setting, where the attributes for which users specify target values might be handled by external, autonomous sources with a variety of access interfaces. We present several algorithms for processing such queries, and evaluate them thoroughly using both synthetic and real Web-accessible data.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126063201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 559
Sequenced subset operators: definition and implementation 序列子集操作符:定义和实现
Proceedings 18th International Conference on Data Engineering Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994699
Joseph Dunn, S. Davey, A. Descour, R. Snodgrass
{"title":"Sequenced subset operators: definition and implementation","authors":"Joseph Dunn, S. Davey, A. Descour, R. Snodgrass","doi":"10.1109/ICDE.2002.994699","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994699","url":null,"abstract":"Difference, intersection, semi join and anti-semi-join may be considered binary subset operators, in that they all return a subset of their left-hand argument. These operators are useful for implementing SQL's EXCEPT, INTERSECT, NOT IN and NOT EXISTS, distributed queries and referential integrity. Difference-all and intersection-all operate on multi-sets and track the number of duplicates in both argument relations; they are used to implement SQL's EXCEPT ALL and INTERSECT ALL. Their temporally sequenced analogues, which effectively apply the subset operator at each point in time, are needed for implementing these constructs in temporal databases. These SQL expressions are complex; most necessitate at least a three-way join, with nested NOT EXISTS clauses. We consider how to implement these operators directly in a DBMS. These operators are interesting in that they can fragment the left-hand validity periods (sequenced difference-all also fragments the right-hand periods) and thus introduce memory complications found neither in their non-temporal counterparts nor in temporal joins and semijoins. We introduce novel algorithms for implementing these operators by ordering the computation so that fragments need not be retained in main memory. We evaluate these algorithms and demonstrate that they are no more expensive than a single conventional join.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132946532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
StreamCorder: fast trial-and-error analysis in scientific databases StreamCorder:在科学数据库中快速试错分析
Proceedings 18th International Conference on Data Engineering Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994769
E. Stolte, G. Alonso
{"title":"StreamCorder: fast trial-and-error analysis in scientific databases","authors":"E. Stolte, G. Alonso","doi":"10.1109/ICDE.2002.994769","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994769","url":null,"abstract":"We have implemented a client/server system for fast trial-and-error analysis: the StreamCorder. The server streams wavelet-encoded views to the clients, where they are cached, decoded and processed. Low-quality decoding is beneficial for slow network connections. Low-resolution decoding greatly accelerates decoding and analysis. Depending on the system resources, cached data and analysis requirements, the user may alter the minimum analysis quality at any time.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128063996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Managing complex and varied data with the IndexFabric/sup TM/ 使用IndexFabric/sup TM/管理复杂多变的数据
Proceedings 18th International Conference on Data Engineering Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994765
N. Sample, Brian F. Cooper, M. Franklin, Gísli R. Hjaltason, Moshe Shadmon, Levy Cohe
{"title":"Managing complex and varied data with the IndexFabric/sup TM/","authors":"N. Sample, Brian F. Cooper, M. Franklin, Gísli R. Hjaltason, Moshe Shadmon, Levy Cohe","doi":"10.1109/ICDE.2002.994765","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994765","url":null,"abstract":"Emerging networked applications present significant challenges for traditional data management techniques for two reasons. First, they are based on data encoded in XML, LDAP directories, etc. that typically have complex inter-relationships. Second, the dynamic nature of networked applications and the need to integrate data from multiple sources results in data that is semior irregularly structured. The IndexFabric has been developed to meet both these challenges. In this demonstration, we show how the IndexFabric efficiently encodes and indexes very large collections of irregular, semistructured, and complex data.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115980197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Efficient indexing structures for mining frequent patterns 用于挖掘频繁模式的高效索引结构
Proceedings 18th International Conference on Data Engineering Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994758
Bin Lan, B. Ooi, K. Tan
{"title":"Efficient indexing structures for mining frequent patterns","authors":"Bin Lan, B. Ooi, K. Tan","doi":"10.1109/ICDE.2002.994758","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994758","url":null,"abstract":"In this paper, we propose a variant of the signature file, called bit-sliced bloom-filtered signature file (BBS), as the basis for implementing filter-and-refine strategies for mining frequent patterns. In the filtering step, the candidate patterns are obtained by scanning BBS instead of the database. The resultant candidate set contains a superset of the frequent patterns. In the refinement phase, each algorithm refines the candidate set to prune away the false drops. Based on this indexing structure, we study two filtering (single and dual filter) and two refinement (sequential scan and probe) mechanisms, thus giving rise to four different strategies. We conducted an extensive performance study to study the effectiveness of BBS, and compared the four proposed processing schemes with the traditional a priori algorithm and the recently proposed FP-tree scheme. Our results show that BBS, as a whole, outperforms the a priori strategy. Moreover, one of the schemes that is based on dual filter and probe refinement performs the best in all cases.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"200 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116484616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
An efficient index structure for shift and scale invariant search of mufti-attribute time sequences 一种有效的多属性时间序列移位和尺度不变搜索索引结构
Proceedings 18th International Conference on Data Engineering Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994720
Tamer Kahveci, Ambuj K. Singh, Aliekber Gürel
{"title":"An efficient index structure for shift and scale invariant search of mufti-attribute time sequences","authors":"Tamer Kahveci, Ambuj K. Singh, Aliekber Gürel","doi":"10.1109/ICDE.2002.994720","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994720","url":null,"abstract":"We consider the problem of shift and scale invariant search for multi-attribute time sequences. Our work fills a void in the existing literature for time sequence similarity since the existing techniques do not consider the general symmetric formulation of the problem. We define a new distance function for mufti-attribute time sequences that is symmetric: the distance between two time sequences is defined to be the smallest Euclidean distance after scaling and shifting either one of the sequences to be as close to the other. We define two models for comparing mufti-attribute time sequences: in the first model, the scaling and shifting of the component sequences are dependent, and in the second model they are independent. We propose a novel index structure called CS-Index (cone slice) for shift and scale invariant comparison of time sequences.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126093998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Detecting changes in XML documents 检测XML文档中的更改
Proceedings 18th International Conference on Data Engineering Pub Date : 2002-02-26 DOI: 10.1109/ICDE.2002.994696
G. Cobena, S. Abiteboul, A. Marian
{"title":"Detecting changes in XML documents","authors":"G. Cobena, S. Abiteboul, A. Marian","doi":"10.1109/ICDE.2002.994696","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994696","url":null,"abstract":"We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating dynamic warehouses capable of storing massive volumes of XML data. Because of the context, our algorithm has to be very efficient in terms of speed and memory space even at the cost of some loss of quality. Also, it considers, besides insertions, deletions and updates (standard in diffs), a move operation on subtrees that is essential in the context of XML. Intuitively, our diff algorithm uses signatures to match (large) subtrees that were left unchanged between the old and new versions. Such exact matchings are then possibly propagated to ancestors and descendants to obtain more matchings. It also uses XML specific information such as ID attributes. We provide a performance analysis of the algorithm. We show that it runs in average in linear time vs. quadratic time for previous algorithms. We present experiments on synthetic data that confirm the analysis. Since this problem is NP-hard, the linear time is obtained by trading some quality. We present experiments (again on synthetic data) that show that the output of our algorithm is reasonably close to the optimal in terms of quality. Finally we present experiments on a small sample of XML pages found on the Web.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122790654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 533
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信