Proceedings. 20th International Conference on Data Engineering最新文献_第3页

Algebraic signatures for scalable distributed data structures 可扩展分布式数据结构的代数签名

Proceedings. 20th International Conference on Data Engineering Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320015

W. Litwin, T. Schwarz

引用次数: 53

Continuously maintaining quantile summaries of the most recent N elements over a data stream 持续维护数据流中最近N个元素的分位数摘要

Proceedings. 20th International Conference on Data Engineering Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320011

Xuemin Lin, Hongjun Lu, Jian Xu, J. Yu

{"title":"Continuously maintaining quantile summaries of the most recent N elements over a data stream","authors":"Xuemin Lin, Hongjun Lu, Jian Xu, J. Yu","doi":"10.1109/ICDE.2004.1320011","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320011","url":null,"abstract":"Statistics over the most recently observed data elements are often required in applications involving data streams, such as intrusion detection in network monitoring, stock price prediction in financial markets, Web log mining for access prediction, and user click stream mining for personalization. Among various statistics, computing quantile summary is probably most challenging because of its complexity. We study the problem of continuously maintaining quantile summary of the most recently observed N elements over a stream so that quantile queries can be answered with a guaranteed precision of /spl epsiv/N. We developed a space efficient algorithm for predefined N that requires only one scan of the input data stream and O(log(/spl epsiv//sup 2/N)//spl epsiv/+1//spl epsiv//sup 2/) space in the worst cases. We also developed an algorithm that maintains quantile summaries for most recent N elements so that quantile queries on any most recent n elements (n /spl les/ N) can be answered with a guaranteed precision of /spl epsiv/n. The worst case space requirement for this algorithm is only O(log/sup 2/(/spl epsiv/N)//spl epsiv//sup 2/). Our performance study indicated that not only the actual quantile estimation error is far below the guaranteed precision but the space requirement is also much less than the given theoretical bound.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"358 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122746675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 91

Database kernel research: what, if anything, is left to do? 数据库内核研究:如果有的话，还剩下什么要做?

Proceedings. 20th International Conference on Data Engineering Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320095

D. Lomet

引用次数: 0

BEA liquid data for WebLogic: XML-based enterprise information integration 用于WebLogic的BEA流动数据:基于xml的企业信息集成

Proceedings. 20th International Conference on Data Engineering Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320051

M. Carey

引用次数: 10

Using stream semantics for continuous queries in media stream processors 在媒体流处理器中使用流语义进行连续查询

Proceedings. 20th International Conference on Data Engineering Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320083

Amarnath Gupta, B. Liu, Pilho Kim, R. Jain

引用次数: 4

Meta data management 元数据管理

Proceedings. 20th International Conference on Data Engineering Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320101

Philip A. Bernstein, Sergey Melnik

{"title":"Meta data management","authors":"Philip A. Bernstein, Sergey Melnik","doi":"10.1109/ICDE.2004.1320101","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320101","url":null,"abstract":"By meta data management, we mean techniques for manipulating schemas and schema-like objects (such as interface definitions and web site maps) and mappings between them. Work on meta data problems goes back to at least the early 1970s, when data translation was the hot database research topic, even before relational databases caught on. Many popular research problems in the past five years are primarily meta data problems, such as data warehouse tools (e.g., ETL – to extract, transform and load), data integration, the semantic web, generation of XML or object-oriented wrappers for SQL databases, and generation of wrappers for web sites. Other classical meta data problems are information resource management, design tool support and integration, and schema evolution and data migration. Despite its longevity and continued importance, there is no widely-accepted conceptual framework for the meta data field, as there is for many other database topics, such as access methods, query processing, and transaction management. In this seminar, we propose such a conceptual framework. It consists of three layers: applications, design patterns, and basic operators. Applications are the end-user problems to be solved, like those listed in the previous paragraph. Design patterns are generic problems that need to be solved in support of many different applications, such as meta modeling (for all meta data problems), answering queries using views (for data integration and the semantic web), and change propagation (for data translation, schema evolution, and round-trip engineering). Basic operators are procedures that are needed to support multiple design patterns and applications, such as matching schemas to produce a mapping, merging schemas based on a mapping, and composing mappings. We will describe several meta data management problems, and for each, we will explain the design patterns and operators that are needed to solve it. We will summarize the main approaches to each design pattern and operator – the main choices of language, data structures, and algorithms – and will highlight the relevant papers that address it. This seminar is targeted at both practicing engineers and researchers. The former will learn about the latest solutions to important meta data problems and the many difficult unsolved problems that are best to avoid. Database researchers, especially professors, will benefit from considering the conceptual framework that we propose, since no database textbooks treat meta data management as a separate topic as far as we know.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122740915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 91

Scalable multimedia disk scheduling 可扩展的多媒体磁盘调度

Proceedings. 20th International Conference on Data Engineering Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320022

M. Mokbel, Walid G. Aref, Khaled M. Elbassioni, I. Kamel

{"title":"Scalable multimedia disk scheduling","authors":"M. Mokbel, Walid G. Aref, Khaled M. Elbassioni, I. Kamel","doi":"10.1109/ICDE.2004.1320022","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320022","url":null,"abstract":"A new multimedia disk-scheduling algorithm, termed Cascaded-SFC, is presented. The Cascaded-SFC multimedia disk scheduler is applicable in environments where multimedia data requests arrive with different quality of service (QoS) requirements such as real-time deadline and user priority. Previous work on disk scheduling has focused on optimizing the seek times and/or meeting the real-time deadlines. The Cascaded-SFC disk scheduler provides a unified framework for multimedia disk scheduling that scales with the number of scheduling parameters. The general idea is based on modeling the multimedia disk requests as points in multiple multidimensional subspaces, where each of the dimensions represents one of the parameters (e.g., one dimension represents the request deadline, another represents the disk cylinder number, and a third dimension represents the priority of the request, etc.). Each multidimensional subspace represents a subset of the QoS parameters that share some common scheduling characteristics. Then the multimedia disk scheduling problem reduces to the problem of finding a linear order to traverse the multidimensional points in each subspace. Multiple space-filling curves are selected to fit the scheduling needs of the QoS parameters in each subspace. The orders in each subspace are integrated in a cascaded way to provide a total order for the whole space. Comprehensive experiments demonstrate the efficiency and scalability of the Cascaded-SFC disk scheduling algorithm over other disk schedulers.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129703912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

NEXSORT: sorting XML in external memory NEXSORT:在外部内存中对XML排序

Proceedings. 20th International Conference on Data Engineering Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320038

Adam Silberstein, Jun Yang

{"title":"NEXSORT: sorting XML in external memory","authors":"Adam Silberstein, Jun Yang","doi":"10.1109/ICDE.2004.1320038","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320038","url":null,"abstract":"XML plays an important role in delivering data over the Internet, and the need to store and manipulate XML in its native format has become increasingly relevant. This growing need necessitates work on developing native XML operators, especially for one as fundamental as sort. We present NEXSORT, an algorithm that leverages the hierarchical nature of XML to efficiently sort an XML document in external memory. In a fully sorted XML document, children of every nonleaf element are ordered according to a given sorting criterion. Among NEXSORT's uses is in combination with structural merge as the XML version of sort-merge join, which allows us to merge large XML documents using only a single pass once they are sorted. The hierarchical structure of an XML document limits the number of possible legal orderings among its elements, which means that sorting XML is fundamentally \"easier\" than sorting a flat file. We prove that the I/O lower bound for sorting XML in external memory is /spl Theta/(max{n,nlog/sub m/(k/B)}), where n is the number of blocks in the input XML document, m is the number of main memory blocks available for sorting, B is the number of elements that can fit in one block, and k is the maximum fan-out of the input document tree. We show that NEXSORT performs within a constant factor of this theoretical lower bound. In practice we demonstrate, even with a naive implementation, NEXSORT significantly outperforms a regular external merge sort of all elements by their key paths, unless the XML document is nearly flat, in which case NEXSORT degenerates essentially to external merge sort.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116938160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Recursive XML schemas, recursive XML queries, and relational storage: XML-to-SQL query translation 递归XML模式、递归XML查询和关系存储:XML到sql查询的转换

Proceedings. 20th International Conference on Data Engineering Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1319983

R. Krishnamurthy, Venkatesan T. Chakaravarthy, R. Kaushik, J. Naughton

引用次数: 72

An efficient algorithm for mining frequent sequences by a new strategy without support counting 一种不支持计数的新策略挖掘频繁序列的高效算法

Proceedings. 20th International Conference on Data Engineering Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320012

D. Chiu, Yi-Hung Wu, Arbee L. P. Chen

{"title":"An efficient algorithm for mining frequent sequences by a new strategy without support counting","authors":"D. Chiu, Yi-Hung Wu, Arbee L. P. Chen","doi":"10.1109/ICDE.2004.1320012","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320012","url":null,"abstract":"Mining sequential patterns in large databases is an important research topic. The main challenge of mining sequential patterns is the high processing cost due to the large amount of data. We propose a new strategy called direct sequence comparison (abbreviated as DISC), which can find frequent sequences without having to compute the support counts of nonfrequent sequences. The main difference between the DISC strategy and the previous works is the way to prune nonfrequent sequences. The previous works are based on the antimonotone property, which prune the nonfrequent sequences according to the frequent sequences with shorter lengths. On the contrary, the DISC strategy prunes the nonfrequent sequences according to the other sequences with the same length. Moreover, we summarize three strategies used in the previous works and design an efficient algorithm called DISC-all to take advantages of all the four strategies. The experimental results show that the DISC-all algorithm outperforms the PrefixSpan algorithm on mining frequent sequences in large databases. In addition, we analyze these strategies to design the dynamic version of our algorithm, which achieves a much better performance.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133138604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 90