21st International Conference on Data Engineering (ICDE'05)最新文献_第9页

Online mining of data streams: applications, techniques and progress 数据流的在线挖掘:应用、技术和进展

21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.101

Haixun Wang, J. Pei, Philip S. Yu

引用次数: 10

THALIA: Test Harness for the Assessment of Legacy Information Integration Approaches 用于评估遗留信息集成方法的测试工具

21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.140

J. Hammer, M. Stonebraker, Oguzhan Topsakal

引用次数: 58

A multiresolution symbolic representation of time series 时间序列的多分辨率符号表示

21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.10

V. Megalooikonomou, Qiang Wang, Guo Li, C. Faloutsos

{"title":"A multiresolution symbolic representation of time series","authors":"V. Megalooikonomou, Qiang Wang, Guo Li, C. Faloutsos","doi":"10.1109/ICDE.2005.10","DOIUrl":"https://doi.org/10.1109/ICDE.2005.10","url":null,"abstract":"Efficiently and accurately searching for similarities among time series and discovering interesting patterns is an important and non-trivial problem. In this paper, we introduce a new representation of time series, the multiresolution vector quantized (MVQ) approximation, along with a new distance function. The novelty of MVQ is that it keeps both local and global information about the original time series in a hierarchical mechanism, processing the original time series at multiple resolutions. Moreover, the proposed representation is symbolic employing key subsequences and potentially allows the application of text-based retrieval techniques into the similarity analysis of time series. The proposed method is fast and scales linearly with the size of database and the dimensionality. Contrary to the vast majority in the literature that uses the Euclidean distance, MVQ uses a multi-resolution/hierarchical distance function. We performed experiments with real and synthetic data. The proposed distance function consistently outperforms all the major competitors (Euclidean, dynamic time warping, piecewise aggregate approximation) achieving up to 20% better precision/recall and clustering accuracy on the tested datasets.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122833431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 109

Configurable security protocols for multi-party data analysis with malicious participants 针对恶意参与者的多方数据分析的可配置安全协议

21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.37

B. Malin, E. Airoldi, Samuel Edoho-Eket, Yiheng Li

{"title":"Configurable security protocols for multi-party data analysis with malicious participants","authors":"B. Malin, E. Airoldi, Samuel Edoho-Eket, Yiheng Li","doi":"10.1109/ICDE.2005.37","DOIUrl":"https://doi.org/10.1109/ICDE.2005.37","url":null,"abstract":"Standard multi-party computation models assume semi-honest behavior, where the majority of participants implement protocols according to specification, an assumption not always plausible. In this paper we introduce a multi-party protocol for collaborative data analysis when participants are malicious and fail to follow specification. The protocol incorporates a semi-trusted third party, which analyzes encrypted data and provides honest responses that only intended recipients can successfully decrypt. The protocol incorporates data confidentiality by enabling participants to receive encrypted responses tailored to their own encrypted data submissions without revealing plaintext to other participants, including the third party. As opposed to previous models, trust need only be placed on a single participant with no data at stake. Additionally, the proposed protocol is configurable in a way that security features are controlled by independent subprotocols. Various combinations of subprotocols allow for a flexible security system, appropriate for a number of distributed data applications, such as secure list comparison.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124986045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Adaptive caching for continuous queries 用于连续查询的自适应缓存

21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.15

S. Babu, Kamesh Munagala, J. Widom, R. Motwani

引用次数: 104

On discovery of extremely low-dimensional clusters using semi-supervised projected clustering 利用半监督投影聚类发现极低维聚类

21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.96

Kevin Y. Yip, D. Cheung, M. Ng

{"title":"On discovery of extremely low-dimensional clusters using semi-supervised projected clustering","authors":"Kevin Y. Yip, D. Cheung, M. Ng","doi":"10.1109/ICDE.2005.96","DOIUrl":"https://doi.org/10.1109/ICDE.2005.96","url":null,"abstract":"Recent studies suggest that projected clusters with extremely low dimensionality exist in many real datasets. A number of projected clustering algorithms have been proposed in the past several years, but few can identify clusters with dimensionality lower than 10% of the total number of dimensions, which are commonly found in some real datasets such as gene expression profiles. In this paper we propose a new algorithm that can accurately identify projected clusters with relevant dimensions as few as 5% of the total number of dimensions. It makes use of a robust objective function that combines object clustering and dimension selection into a single optimization problem. The algorithm can also utilize domain knowledge in the form of labeled objects and labeled dimensions to improve its clustering accuracy. We believe this is the first semi-supervised projected clustering algorithm. Both theoretical analysis and experimental results show that by using a small amount of input knowledge, possibly covering only a portion of the underlying classes, the new algorithm can be further improved to accurately detect clusters with only 1% of the dimensions being relevant. The algorithm is also useful in getting a target set of clusters when there are multiple possible groupings of the objects.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128345870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 79

A relationally complete visual query language for heterogeneous data sources and pervasive querying 一种相对完整的可视化查询语言，用于异构数据源和普适查询

21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.12

S. Polyviou, G. Samaras, P. Evripidou

引用次数: 23

Finding (recently) frequent items in distributed data streams 在分布式数据流中查找(最近)频繁的项

21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.68

A. Manjhi, Vladislav Shkapenyuk, Kedar Dhamdhere, Christopher Olston

{"title":"Finding (recently) frequent items in distributed data streams","authors":"A. Manjhi, Vladislav Shkapenyuk, Kedar Dhamdhere, Christopher Olston","doi":"10.1109/ICDE.2005.68","DOIUrl":"https://doi.org/10.1109/ICDE.2005.68","url":null,"abstract":"We consider the problem of maintaining frequency counts for items occurring frequently in the union of multiple distributed data streams. Naive methods of combining approximate frequency counts from multiple nodes tend to result in excessively large data structures that are costly to transfer among nodes. To minimize communication requirements, the degree of precision maintained by each node while counting item frequencies must be managed carefully. We introduce the concept of a precision gradient for managing precision when nodes are arranged in a hierarchical communication structure. We then study the optimization problem of how to set the precision gradient so as to minimize communication, and provide optimal solutions that minimize worst-case communication load over all possible inputs. We then introduce a variant designed to perform well in practice, with input data that does not conform to worst-case characteristics. We verify the effectiveness of our approach empirically using real-world data, and show that our methods incur substantially less communication than naive approaches while providing the same error guarantees on answers.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126933491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 216

Towards building a MetaQuerier: extracting and matching Web query interfaces 构建一个元查询器:提取和匹配Web查询接口

21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.145

Bin He, Zhen Zhang, K. Chang

{"title":"Towards building a MetaQuerier: extracting and matching Web query interfaces","authors":"Bin He, Zhen Zhang, K. Chang","doi":"10.1109/ICDE.2005.145","DOIUrl":"https://doi.org/10.1109/ICDE.2005.145","url":null,"abstract":"We witness the rapid growth and thus the prevalence of databases on the Web. Our recent study in April 2004 estimated 450,000 online databases. On this deep Web, myriad databases provide dynamic query-based data access through their query interfaces, instead of static URL links. It is thus essential to integrate these query interfaces for integrating the deep Web. The overall goal of the MetaQuerier project aims at opening up the deep Web to users, by building a system to help users exploring and integrating deep Web sources. In particular, to start with, we focus on the integration of deep Web sources in the same domain, which is itself an important integration task. To automate this integration scenario, we need to solve two critical problems: extracting query interfaces and matching query interfaces. To solve the interface extraction problem, we introduce a parsing paradigm by hypothesizing the existence of hidden syntax which describes the layout and semantic of Web interfaces. Also, unlike traditional pairwise schema matching, we propose a holistic matching approach, which matches all schemas at the same time with the hypothesis of a hidden schema model. Therefore, our techniques explore, in essence, \"data mining for information integration.\" That is, we mine the observable information to discover the underlying semantics.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"303 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133489985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

AutoLag: automatic discovery of lag correlations in stream data AutoLag:自动发现流数据中的滞后相关性

21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.24

Yasushi Sakurai, S. Papadimitriou, C. Faloutsos

引用次数: 8