21st International Conference on Data Engineering (ICDE'05)最新文献

筛选
英文 中文
Online mining of data streams: applications, techniques and progress 数据流的在线挖掘:应用、技术和进展
21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.101
Haixun Wang, J. Pei, Philip S. Yu
{"title":"Online mining of data streams: applications, techniques and progress","authors":"Haixun Wang, J. Pei, Philip S. Yu","doi":"10.1109/ICDE.2005.101","DOIUrl":"https://doi.org/10.1109/ICDE.2005.101","url":null,"abstract":"In this paper, we focus on the differences between mining static large data sets and data streams. Over the years, the database and data mining community have learned valuable lessons from mining static large data sets, and developed many useful algorithms and tools for this purpose. The paper aims at providing a shortcut to the current frontier of stream mining research. We emphasize the research problems, the inherent technical challenges and the latest results. Particularly, the paper highlights new challenges and potential research interests. Research community has been interested in the integration between data mining tasks and database management systems.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133290733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
THALIA: Test Harness for the Assessment of Legacy Information Integration Approaches 用于评估遗留信息集成方法的测试工具
21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.140
J. Hammer, M. Stonebraker, Oguzhan Topsakal
{"title":"THALIA: Test Harness for the Assessment of Legacy Information Integration Approaches","authors":"J. Hammer, M. Stonebraker, Oguzhan Topsakal","doi":"10.1109/ICDE.2005.140","DOIUrl":"https://doi.org/10.1109/ICDE.2005.140","url":null,"abstract":"We introduce our new, publicly available testbed and benchmark called THALIA (Test Harness for the Assessment of Legacy information Integration Approaches) for testing and evaluating integration technologies. THALIA provides researchers with a collection of 40 downloadable data sources representing University course catalogs from computer science departments worldwide. In addition, THALIA currently provides a set of twelve challenge queries as well as a scoring function for ranking the performance of an integration system. A second contribution is a systematic classification of the types of syntactic and semantic heterogeneities, which directly lead to the twelve challenge. We have chosen course information as our domain of discourse because it is well known and easy to understand. Furthermore, there is an abundance of data sources publicly available that allowed us to develop a testbed exhibiting all of the syntactic and semantic heterogeneities that we have identified.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122406733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
A multiresolution symbolic representation of time series 时间序列的多分辨率符号表示
21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.10
V. Megalooikonomou, Qiang Wang, Guo Li, C. Faloutsos
{"title":"A multiresolution symbolic representation of time series","authors":"V. Megalooikonomou, Qiang Wang, Guo Li, C. Faloutsos","doi":"10.1109/ICDE.2005.10","DOIUrl":"https://doi.org/10.1109/ICDE.2005.10","url":null,"abstract":"Efficiently and accurately searching for similarities among time series and discovering interesting patterns is an important and non-trivial problem. In this paper, we introduce a new representation of time series, the multiresolution vector quantized (MVQ) approximation, along with a new distance function. The novelty of MVQ is that it keeps both local and global information about the original time series in a hierarchical mechanism, processing the original time series at multiple resolutions. Moreover, the proposed representation is symbolic employing key subsequences and potentially allows the application of text-based retrieval techniques into the similarity analysis of time series. The proposed method is fast and scales linearly with the size of database and the dimensionality. Contrary to the vast majority in the literature that uses the Euclidean distance, MVQ uses a multi-resolution/hierarchical distance function. We performed experiments with real and synthetic data. The proposed distance function consistently outperforms all the major competitors (Euclidean, dynamic time warping, piecewise aggregate approximation) achieving up to 20% better precision/recall and clustering accuracy on the tested datasets.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122833431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 109
Configurable security protocols for multi-party data analysis with malicious participants 针对恶意参与者的多方数据分析的可配置安全协议
21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.37
B. Malin, E. Airoldi, Samuel Edoho-Eket, Yiheng Li
{"title":"Configurable security protocols for multi-party data analysis with malicious participants","authors":"B. Malin, E. Airoldi, Samuel Edoho-Eket, Yiheng Li","doi":"10.1109/ICDE.2005.37","DOIUrl":"https://doi.org/10.1109/ICDE.2005.37","url":null,"abstract":"Standard multi-party computation models assume semi-honest behavior, where the majority of participants implement protocols according to specification, an assumption not always plausible. In this paper we introduce a multi-party protocol for collaborative data analysis when participants are malicious and fail to follow specification. The protocol incorporates a semi-trusted third party, which analyzes encrypted data and provides honest responses that only intended recipients can successfully decrypt. The protocol incorporates data confidentiality by enabling participants to receive encrypted responses tailored to their own encrypted data submissions without revealing plaintext to other participants, including the third party. As opposed to previous models, trust need only be placed on a single participant with no data at stake. Additionally, the proposed protocol is configurable in a way that security features are controlled by independent subprotocols. Various combinations of subprotocols allow for a flexible security system, appropriate for a number of distributed data applications, such as secure list comparison.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124986045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Adaptive caching for continuous queries 用于连续查询的自适应缓存
21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.15
S. Babu, Kamesh Munagala, J. Widom, R. Motwani
{"title":"Adaptive caching for continuous queries","authors":"S. Babu, Kamesh Munagala, J. Widom, R. Motwani","doi":"10.1109/ICDE.2005.15","DOIUrl":"https://doi.org/10.1109/ICDE.2005.15","url":null,"abstract":"We address the problem of executing continuous multiway join queries in unpredictable and volatile environments. Our query class captures windowed join queries in data stream systems as well as conventional maintenance of materialized join views. Our adaptive approach handles streams of updates whose rates and data characteristics may change over time, as well as changes in system conditions such as memory availability. In this paper we focus specifically on the problem of adaptive placement and removal of caches to optimize join performance. Our approach automatically considers conventional tree-shaped join plans with materialized subresults at every intermediate node, sub result-free MJoins, and the entire spectrum between them. We provide algorithms for selecting caches, monitoring their cost and benefits in current conditions, allocating memory to caches, and adapting as conditions change. All of our algorithms are implemented in the STREAM prototype data stream management system and a thorough experimental evaluation is included.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125017499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 104
On discovery of extremely low-dimensional clusters using semi-supervised projected clustering 利用半监督投影聚类发现极低维聚类
21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.96
Kevin Y. Yip, D. Cheung, M. Ng
{"title":"On discovery of extremely low-dimensional clusters using semi-supervised projected clustering","authors":"Kevin Y. Yip, D. Cheung, M. Ng","doi":"10.1109/ICDE.2005.96","DOIUrl":"https://doi.org/10.1109/ICDE.2005.96","url":null,"abstract":"Recent studies suggest that projected clusters with extremely low dimensionality exist in many real datasets. A number of projected clustering algorithms have been proposed in the past several years, but few can identify clusters with dimensionality lower than 10% of the total number of dimensions, which are commonly found in some real datasets such as gene expression profiles. In this paper we propose a new algorithm that can accurately identify projected clusters with relevant dimensions as few as 5% of the total number of dimensions. It makes use of a robust objective function that combines object clustering and dimension selection into a single optimization problem. The algorithm can also utilize domain knowledge in the form of labeled objects and labeled dimensions to improve its clustering accuracy. We believe this is the first semi-supervised projected clustering algorithm. Both theoretical analysis and experimental results show that by using a small amount of input knowledge, possibly covering only a portion of the underlying classes, the new algorithm can be further improved to accurately detect clusters with only 1% of the dimensions being relevant. The algorithm is also useful in getting a target set of clusters when there are multiple possible groupings of the objects.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128345870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 79
A relationally complete visual query language for heterogeneous data sources and pervasive querying 一种相对完整的可视化查询语言,用于异构数据源和普适查询
21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.12
S. Polyviou, G. Samaras, P. Evripidou
{"title":"A relationally complete visual query language for heterogeneous data sources and pervasive querying","authors":"S. Polyviou, G. Samaras, P. Evripidou","doi":"10.1109/ICDE.2005.12","DOIUrl":"https://doi.org/10.1109/ICDE.2005.12","url":null,"abstract":"In this paper we introduce and formally define Query by Browsing (QBB), a scalable, relationally complete visual query language based on the desktop user interface paradigm and tuple relational calculus that allows the formulation of complex queries over relational, entity-relationship, object-oriented and XML data sources on a variety of handheld and desktop platforms. It is to our knowledge the first visual query language to combine the important characteristics of usability, scalability, expressive power and flexibility. We support these claims by demonstrating the similarity of the QBB paradigm to the popular desktop user interface paradigm, by relating it to relational calculus and relational algebra and by describing Chiromancer II, a Web-based implementation of the QBB paradigm for handheld devices. We also discuss ways in which non-relational sources can be represented and queried and compare QBB to related work in the area of visual query languages for a variety of data models. We finally offer conclusions and thoughts for future work.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124012739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Finding (recently) frequent items in distributed data streams 在分布式数据流中查找(最近)频繁的项
21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.68
A. Manjhi, Vladislav Shkapenyuk, Kedar Dhamdhere, Christopher Olston
{"title":"Finding (recently) frequent items in distributed data streams","authors":"A. Manjhi, Vladislav Shkapenyuk, Kedar Dhamdhere, Christopher Olston","doi":"10.1109/ICDE.2005.68","DOIUrl":"https://doi.org/10.1109/ICDE.2005.68","url":null,"abstract":"We consider the problem of maintaining frequency counts for items occurring frequently in the union of multiple distributed data streams. Naive methods of combining approximate frequency counts from multiple nodes tend to result in excessively large data structures that are costly to transfer among nodes. To minimize communication requirements, the degree of precision maintained by each node while counting item frequencies must be managed carefully. We introduce the concept of a precision gradient for managing precision when nodes are arranged in a hierarchical communication structure. We then study the optimization problem of how to set the precision gradient so as to minimize communication, and provide optimal solutions that minimize worst-case communication load over all possible inputs. We then introduce a variant designed to perform well in practice, with input data that does not conform to worst-case characteristics. We verify the effectiveness of our approach empirically using real-world data, and show that our methods incur substantially less communication than naive approaches while providing the same error guarantees on answers.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126933491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 216
Towards building a MetaQuerier: extracting and matching Web query interfaces 构建一个元查询器:提取和匹配Web查询接口
21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.145
Bin He, Zhen Zhang, K. Chang
{"title":"Towards building a MetaQuerier: extracting and matching Web query interfaces","authors":"Bin He, Zhen Zhang, K. Chang","doi":"10.1109/ICDE.2005.145","DOIUrl":"https://doi.org/10.1109/ICDE.2005.145","url":null,"abstract":"We witness the rapid growth and thus the prevalence of databases on the Web. Our recent study in April 2004 estimated 450,000 online databases. On this deep Web, myriad databases provide dynamic query-based data access through their query interfaces, instead of static URL links. It is thus essential to integrate these query interfaces for integrating the deep Web. The overall goal of the MetaQuerier project aims at opening up the deep Web to users, by building a system to help users exploring and integrating deep Web sources. In particular, to start with, we focus on the integration of deep Web sources in the same domain, which is itself an important integration task. To automate this integration scenario, we need to solve two critical problems: extracting query interfaces and matching query interfaces. To solve the interface extraction problem, we introduce a parsing paradigm by hypothesizing the existence of hidden syntax which describes the layout and semantic of Web interfaces. Also, unlike traditional pairwise schema matching, we propose a holistic matching approach, which matches all schemas at the same time with the hypothesis of a hidden schema model. Therefore, our techniques explore, in essence, \"data mining for information integration.\" That is, we mine the observable information to discover the underlying semantics.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"303 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133489985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
AutoLag: automatic discovery of lag correlations in stream data AutoLag:自动发现流数据中的滞后相关性
21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.24
Yasushi Sakurai, S. Papadimitriou, C. Faloutsos
{"title":"AutoLag: automatic discovery of lag correlations in stream data","authors":"Yasushi Sakurai, S. Papadimitriou, C. Faloutsos","doi":"10.1109/ICDE.2005.24","DOIUrl":"https://doi.org/10.1109/ICDE.2005.24","url":null,"abstract":"We have introduced the problem of automatic lag correlation detection on streaming data and proposed AutoLag to address this problem by using careful approximations and smoothing. Our experiments on real and realistic data show that AutoLag works as expected, estimating the unknown lags with excellent accuracy and significant speed-up. In our experiments on real and realistic data, AutoLag was up to about 42,000 times faster than the naive implementation, with at most 1% relative error.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130009253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信