Proceedings 18th International Conference on Data Engineering最新文献

筛选
英文 中文
Geometric-similarity retrieval in large image bases 大型图像库的几何相似检索
Proceedings 18th International Conference on Data Engineering Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994757
I. Fudos, Leonidas Palios, E. Pitoura
{"title":"Geometric-similarity retrieval in large image bases","authors":"I. Fudos, Leonidas Palios, E. Pitoura","doi":"10.1109/ICDE.2002.994757","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994757","url":null,"abstract":"We propose a novel approach to shape-based image retrieval that builds upon a similarity criterion which is based on the average point set distance. Compared to traditional techniques, such as dimensionality reduction, our method exhibits better behavior in that it maintains the average topology of shapes independently of the number of points used to represent them and is more resilient to noise. An efficient algorithm is presented based on an incremental \"fattening,\" of the query shape until the best match is discovered. The algorithm uses simplex range search techniques and fractional cascading to provide an average polylogarithmic time complexity on the total number of shape vertices. The algorithm is extended to perform additional fast approximate matching, when there is no image sufficiently similar to the query image. We present techniques for the efficient external storage of the shape base and of the auxiliary geometric data structures used by the algorithm. Finally, we show how our approach can be used for processing queries, containing pairwise relations of object boundaries such as contain, tangent, and overlap. Such queries are either extracted from some user drafted sketch or defined explicitly by the user. Alternative methods are presented for forming query execution plans.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116748449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Approximating a data stream for querying and estimation: algorithms and performance evaluation 用于查询和估计的近似数据流:算法和性能评估
Proceedings 18th International Conference on Data Engineering Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994775
S. Guha, Nick Koudas
{"title":"Approximating a data stream for querying and estimation: algorithms and performance evaluation","authors":"S. Guha, Nick Koudas","doi":"10.1109/ICDE.2002.994775","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994775","url":null,"abstract":"Obtaining fast and good-quality approximations to data distributions is a problem of central interest to database management. A variety of popular database applications, including approximate querying, similarity searching and data mining in most application domains, rely on such good-quality approximations. Histogram-based approximation is a very popular method in database theory and practice to succinctly represent a data distribution in a space-efficient manner. In this paper, we place the problem of histogram construction into perspective and we generalize it by raising the requirement of a finite data set and/or known data set size. We consider the case of an infinite data set in which data arrive continuously, forming an infinite data stream. In this context, we present single-pass algorithms that are capable of constructing histograms of provable good quality. We present algorithms for the fixed-window variant of the basic histogram construction problem, supporting incremental maintenance of the histograms. The proposed algorithms trade accuracy for speed and allow for a graceful tradeoff between the two, based on application requirements. In the case of approximate queries on infinite data streams, we present a detailed experimental evaluation comparing our algorithms with other applicable techniques using real data sets, demonstrating the superiority of our proposal.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128397768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 109
YFilter: efficient and scalable filtering of XML documents YFilter:高效和可伸缩的XML文档过滤
Proceedings 18th International Conference on Data Engineering Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994748
Y. Diao, Peter M. Fischer, M. Franklin, Raymond To
{"title":"YFilter: efficient and scalable filtering of XML documents","authors":"Y. Diao, Peter M. Fischer, M. Franklin, Raymond To","doi":"10.1109/ICDE.2002.994748","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994748","url":null,"abstract":"Much of the data exchanged over the Internet will soon be encoded in XML, allowing for sophisticated filtering and content-based routing. We have built a filtering engine called YFilter, which filters streaming XML documents according to XQuery or XPath queries that involve both path expressions and predicates. Unlike previous work, YFilter uses a novel NFA-based execution model. We present the structures and algorithms underlying YFilter, and show its efficiency and scalability under various workloads.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114712398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 293
Fjording the stream: an architecture for queries over streaming sensor data Fjording the stream:一种对流传感器数据进行查询的架构
Proceedings 18th International Conference on Data Engineering Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994774
S. Madden, M. Franklin
{"title":"Fjording the stream: an architecture for queries over streaming sensor data","authors":"S. Madden, M. Franklin","doi":"10.1109/ICDE.2002.994774","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994774","url":null,"abstract":"If industry visionaries are correct, our lives will soon be full of sensors, connected together in loose conglomerations via wireless networks, each monitoring and collecting data about the environment at large. These sensors behave very differently from traditional database sources: they have intermittent connectivity, are limited by severe power constraints, and typically sample periodically and push immediately, keeping no record of historical information. These limitations make traditional database systems inappropriate for queries over sensors. We present the Fjords architecture for managing multiple queries over many sensors, and show how it can be used to limit sensor resource demands while maintaining high query throughput. We evaluate our architecture using traces from a network of traffic sensors deployed on Interstate 80 near Berkeley and present performance results that show how query throughput, communication costs and power consumption are necessarily coupled in sensor environments.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121551310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 602
Techniques for storing XML 存储XML的技术
Proceedings 18th International Conference on Data Engineering Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994740
M. Fernández, S. Amer-Yahia
{"title":"Techniques for storing XML","authors":"M. Fernández, S. Amer-Yahia","doi":"10.1109/ICDE.2002.994740","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994740","url":null,"abstract":"XML is the de facto standard for data exchange between applications on the Web. Applications, such as electronic markets, will produce and consume large volumes of data and therefore will require efficient and reliable storage and retrieval of XML data. Many techniques for XML storage have been proposed, including flat files, relational database management systems, object-oriented database systems, LDAP directories, and native XML database systems. To better understand the requirements of XML storage systems, we first review various classes of XML documents including highly structured data as stored in relational databases, \"mixed\" content from document-processing applications, and \"streams-oriented\" data from ecommerce and transactional applications. We also consider the types of queries typically applied to these classes of documents. In the second part, we present features of the XQuery and XPath data model that must be supported by an XML storage system and then we describe in detail a variety of storage alternatives from industry and research. We focus on techniques that use relational storage. Typically, these techniques produce a logical relational schema for the XML data and treat the storage system as an \"black box\". In the last part of the tutorial, we consider new techniques that open the storage system's \"black box\" so that we can take advantage of physical-layout features.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115517024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Efficient temporal join processing using indices 使用索引进行有效的临时连接处理
Proceedings 18th International Conference on Data Engineering Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994701
Donghui Zhang, V. Tsotras, B. Seeger
{"title":"Efficient temporal join processing using indices","authors":"Donghui Zhang, V. Tsotras, B. Seeger","doi":"10.1109/ICDE.2002.994701","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994701","url":null,"abstract":"We examine the problem of processing temporal joins in the presence of indexing schemes. Previous work on temporal joins has concentrated on non-indexed relations which were fully scanned. Given the large data volumes created by the ever increasing time dimension, sequential scanning is prohibitive. This is especially true when the temporal join involves only parts of the joining relations (e.g., a given time interval instead of the whole timeline). Utilizing an index becomes then beneficial as it directs the join to the data of interest. We consider temporal join algorithms for three representative indexing schemes, namely a B+-tree, an R*-tree and a temporal index, the Multiversion B+-tree (MVBT). Both the B+-tree and R*-tree result in simple but not efficient join algorithms because neither index achieves good temporal data clustering. Better clustering is maintained by the MVBT through record copying. Nevertheless, copies can greatly affect the correctness and effectiveness of the join algorithms. We identify these problems and propose efficient solutions and optimizations. An extensive comparison of all index based temporal joins, using a variety of datasets and query characteristics shows that the MVBT based join algorithms are consistently faster. In particular the link-based algorithm has the most robust behavior. In our experiments it showed a ten fold improvement over the R*-tree joins while it was between six and thirty times faster than the B+-tree joins.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115182316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
DBXplorer: a system for keyword-based search over relational databases DBXplorer:一个基于关键字的关系数据库搜索系统
Proceedings 18th International Conference on Data Engineering Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994693
S. Agrawal, S. Chaudhuri, Gautam Das
{"title":"DBXplorer: a system for keyword-based search over relational databases","authors":"S. Agrawal, S. Chaudhuri, Gautam Das","doi":"10.1109/ICDE.2002.994693","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994693","url":null,"abstract":"Internet search engines have popularized the keyword-based search paradigm. While traditional database management systems offer powerful query languages, they do not allow keyword-based search. In this paper, we discuss DBXplorer, a system that enables keyword-based searches in relational databases. DBXplorer has been implemented using a commercial relational database and Web server and allows users to interact via a browser front-end. We outline the challenges and discuss the implementation of our system, including results of extensive experimental evaluation.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126145445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 879
Mixing querying and navigation in MIX 混合MIX中的查询和导航
Proceedings 18th International Conference on Data Engineering Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994714
Pratik Mukhopadhyay, Y. Papakonstantinou
{"title":"Mixing querying and navigation in MIX","authors":"Pratik Mukhopadhyay, Y. Papakonstantinou","doi":"10.1109/ICDE.2002.994714","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994714","url":null,"abstract":"Web-based information systems provide to their users the ability to interleave querying and browsing during their information discovery efforts. The MIX system provides an API called QDOM (Querible Document Object Model) that supports the interleaved querying and browsing of virtual XML views, specified in an XQuery-like language. QDOM is based on the DOM standard. It allows the client applications to navigate into the view using standard DOM navigation commands. Then the application can use any visited node as the root for a query that creates a new view. The query/navigation processing algorithms of MIX perform decontextualization, i.e., they translate a query that has been issued from within the context of other queries and navigations into efficient queries that are understood by the source outside of the context of previous operations. In addition, MIX provides a navigation-driven query evaluation model, where source data are retrieved only as needed by the subsequent navigations. This paper presents how MIX supports QDOM on views of relational databases.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126076150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Streaming-data algorithms for high-quality clustering 用于高质量聚类的流数据算法
Proceedings 18th International Conference on Data Engineering Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994785
Liadan O'Callaghan, A. Meyerson, R. Motwani, Nina Mishra, S. Guha
{"title":"Streaming-data algorithms for high-quality clustering","authors":"Liadan O'Callaghan, A. Meyerson, R. Motwani, Nina Mishra, S. Guha","doi":"10.1109/ICDE.2002.994785","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994785","url":null,"abstract":"Streaming data analysis has recently attracted attention in numerous applications including telephone records, Web documents and click streams. For such analysis, single-pass algorithms that consume a small amount of memory are critical. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithm's performance on synthetic and real data streams.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122289567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 681
Design and implementation of a high-performance distributed Web crawler 高性能分布式Web爬虫的设计与实现
Proceedings 18th International Conference on Data Engineering Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994750
Vladislav Shkapenyuk, Torsten Suel
{"title":"Design and implementation of a high-performance distributed Web crawler","authors":"Vladislav Shkapenyuk, Torsten Suel","doi":"10.1109/ICDE.2002.994750","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994750","url":null,"abstract":"Broad Web search engines as well as many more specialized search tools rely on Web crawlers to acquire large collections of pages for indexing and analysis. Such a Web crawler may interact with millions of hosts over a period of weeks or months, and thus issues of robustness, flexibility, and manageability are of major importance. In addition, I/O performance, network resources, and OS limits must be taken into account in order to achieve high performance at a reasonable cost. In this paper, we describe the design and implementation of a distributed Web crawler that runs on a network of workstations. The crawler scales to (at least) several hundred pages per second, is resilient against system crashes and other events, and can be adapted to various crawling applications. We present the software architecture of the system, discuss the, performance bottlenecks, and describe efficient techniques for achieving high performance. We also report preliminary experimental results based on a crawl of 120 million pages on 5 million hosts.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115103697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 410
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信