{"title":"R-trees with Update Memos","authors":"Xiaopeng Xiong, Walid G. Aref","doi":"10.1109/ICDE.2006.125","DOIUrl":"https://doi.org/10.1109/ICDE.2006.125","url":null,"abstract":"The problem of frequently updating multi-dimensional indexes arises in many location-dependent applications. While the R-tree and its variants are one of the dominant choices for indexing multi-dimensional objects, the R-tree exhibits inferior performance in the presence of frequent updates. In this paper, we present an R-tree variant, termed the RUM-tree (stands for R-tree with Update Memo) that minimizes the cost of object updates. The RUM-tree processes updates in a memo-based approach that avoids disk accesses for purging old entries during an update process. Therefore, the cost of an update operation in the RUM-tree reduces to the cost of only an insert operation. The removal of old object entries is carried out by a garbage cleaner inside the RUM-tree. In this paper, we present the details of the RUM-tree and study its properties. Theoretical analysis and experimental evaluation demonstrate that the RUMtree outperforms other R-tree variants by up to a factor of eight in scenarios with frequent updates.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"65 1","pages":"22-22"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85127798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Zhang, Clement T. Yu, N. Smalheiser, Vetle I. Torvik
{"title":"Segmentation of Publication Records of Authors from the Web","authors":"Wei Zhang, Clement T. Yu, N. Smalheiser, Vetle I. Torvik","doi":"10.1109/ICDE.2006.137","DOIUrl":"https://doi.org/10.1109/ICDE.2006.137","url":null,"abstract":"Publication records are often found in the authors’ personal home pages. If such a record is partitioned into a list of semantic fields of authors, title, date, etc., the unstructured texts can be converted into structured data, which can be used in other applications. In this paper, we present PEPURS, a publication record segmentation system. It adopts a novel \"Split and Merge\" strategy. A publication record is split into segments; multiple statistical classifiers compute their likelihoods of belonging to different fields; finally adjacent segments are merged if they belong to the same field. PEPURS introduces the punctuation marks and their neighboring texts as a new feature to distinguish different roles of the marks. PEPURS yields high accuracy scores in experiments.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"5 1","pages":"120-120"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90727799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Composition and Disclosure of Unlinkable Distributed Databases","authors":"B. Malin, L. Sweeney","doi":"10.1109/ICDE.2006.41","DOIUrl":"https://doi.org/10.1109/ICDE.2006.41","url":null,"abstract":"An individual’s location-visit pattern, or trail, can be leveraged to link sensitive data back to identity. We propose a secure multiparty computation protocol that enables locations to provably prevent such linkages. The protocol incorporates a controllable parameter specifying the minimum number of identities a sensitive piece of data must be linkable to via its trail.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"25 1","pages":"118-118"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84029051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoming Jin, Xinqiang Zuo, K. Lam, Jianmin Wang, Jiaguang Sun
{"title":"Efficient Discovery of Emerging Frequent Patterns in ArbitraryWindows on Data Streams","authors":"Xiaoming Jin, Xinqiang Zuo, K. Lam, Jianmin Wang, Jiaguang Sun","doi":"10.1109/ICDE.2006.57","DOIUrl":"https://doi.org/10.1109/ICDE.2006.57","url":null,"abstract":"This paper proposes an effective data mining technique for finding useful patterns in streaming sequences. At present, typical approaches to this problem are to search for patterns in a fixed-size window sliding through the stream of data being collected. The practical values of such approaches are limited in that, in typical application scenarios, the patterns are emerging and it is difficult, if not impossible, to determine a priori a suitable window size within which useful patterns may exist. It is therefore desirable to devise techniques that can identify useful patterns with arbitrary window sizes. Attempts to this problem are challenging, however, because it requires a highly efficient searching in a substantially bigger solution space. This paper presents a new method which includes firstly a pruning strategy to reduce the search space and secondly a mining strategy that adopts a dynamic index structure to allow efficient discovery of emerging patterns in a streaming sequence. Experimental results on real data and synthetic data show that the proposed method outperforms other existing schemes both in computational efficiency and effectiveness in finding useful patterns.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"18 1","pages":"113-113"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87089894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What’s Different: Distributed, Continuous Monitoring of Duplicate-Resilient Aggregates on Data Streams","authors":"Graham Cormode, S. Muthukrishnan, W. Zhuang","doi":"10.1109/ICDE.2006.173","DOIUrl":"https://doi.org/10.1109/ICDE.2006.173","url":null,"abstract":"Emerging applications in sensor systems and network-wide IP traffic analysis present many technical challenges. They need distributed monitoring and continuous tracking of events. They have severe resource constraints not only at each site in terms of per-update processing time and archival space for highspeed streams of observations, but also crucially, communication constraints for collaborating on the monitoring task. These elements have been addressed in a series of recent works. A fundamental issue that arises is that one cannot make the \"uniqueness\" assumption on observed events which is present in previous works, since widescale monitoring invariably encounters the same events at different points. For example, within the network of an Internet Service Provider packets of the same flow will be observed in different routers; similarly, the same individual will be observed by multiple mobile sensors in monitoring wild animals. Aggregates of interest on such distributed environments must be resilient to duplicate observations. We study such duplicate-resilient aggregates that measure the extent of the duplication―how many unique observations are there, how many observations are unique―as well as standard holistic aggregates such as quantiles and heavy hitters over the unique items. We present accuracy guaranteed, highly communication-efficient algorithms for these aggregates that work within the time and space constraints of high speed streams. We also present results of a detailed experimental study on both real-life and synthetic data.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"7 1","pages":"57-57"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85632651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SIPPER: Selecting Informative Peers in Structured P2P Environment for Content-Based Retrieval","authors":"Shuigeng Zhou, Zhengjie Zhang, Weining Qian, Aoying Zhou","doi":"10.1109/ICDE.2006.139","DOIUrl":"https://doi.org/10.1109/ICDE.2006.139","url":null,"abstract":"In this demonstration, we present a prototype system called SIPPER, which is the abbreviation for Selecting Informative Peers in Structured P2P Environment for Content-based Retrieval. SIPPER distinguishes itself from the existing P2P-IR systems by the following two features: First, to improve retrieval efficiency, SIPPER employs a novel peer selection method to direct the query to a small fraction of relevant peers in the network for searching globally relevant documents. Second, to reduce the bandwidth cost of meta data publishing, SIPPER uses a new publishing mechanism, the term-node publishing mechanism, which is different from the traditional term-document model [2].","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"23 1","pages":"161-161"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85779219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sriram Mohan, Jonathan Klinginsmith, Arijit Sengupta, Yuqing Wu
{"title":"ACXESS - Access Control for XML with Enhanced Security Specifications","authors":"Sriram Mohan, Jonathan Klinginsmith, Arijit Sengupta, Yuqing Wu","doi":"10.1109/ICDE.2006.12","DOIUrl":"https://doi.org/10.1109/ICDE.2006.12","url":null,"abstract":"We present ACXESS (Access Control for XML with Enhanced Security Specifications), a system for specifying and enforcing enhanced security constraints on XML via virtual \"security views\" and query rewrites. ACXESS is the first system that bears the capability to specify and enforce complicated security policies on both subtrees and structural relationships.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"23 1","pages":"171-171"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79696932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking","authors":"Dong Xin, Zheng Shao, Jiawei Han, Hongyan Liu","doi":"10.1109/ICDE.2006.31","DOIUrl":"https://doi.org/10.1109/ICDE.2006.31","url":null,"abstract":"It is well recognized that data cubing often produces huge outputs. Two popular efforts devoted to this problem are (1) iceberg cube, where only significant cells are kept, and (2) closed cube, where a group of cells which preserve roll-up/drill-down semantics are losslessly compressed to one cell. Due to its usability and importance, efficient computation of closed cubes still warrants a thorough study. In this paper, we propose a new measure, called closedness, for efficient closed data cubing. We show that closedness is an algebraic measure and can be computed efficiently and incrementally. Based on closedness measure, we develop an an aggregation-based approach, called C-Cubing (i.e., Closed-Cubing), and integrate it into two successful iceberg cubing algorithms: MM-Cubing and Star-Cubing. Our performance study shows that C-Cubing runs almost one order of magnitude faster than the previous approaches. We further study how the performance of the alternative algorithms of C-Cubing varies w.r.t the properties of the data sets.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"93 1","pages":"4-4"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83886403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mondrian Multidimensional K-Anonymity","authors":"K. LeFevre, D. DeWitt, R. Ramakrishnan","doi":"10.1109/ICDE.2006.101","DOIUrl":"https://doi.org/10.1109/ICDE.2006.101","url":null,"abstract":"K-Anonymity has been proposed as a mechanism for protecting privacy in microdata publishing, and numerous recoding \"models\" have been considered for achieving ��anonymity. This paper proposes a new multidimensional model, which provides an additional degree of flexibility not seen in previous (single-dimensional) approaches. Often this flexibility leads to higher-quality anonymizations, as measured both by general-purpose metrics and more specific notions of query answerability. Optimal multidimensional anonymization is NP-hard (like previous optimal ��-anonymity problems). However, we introduce a simple greedy approximation algorithm, and experimental results show that this greedy algorithm frequently leads to more desirable anonymizations than exhaustive optimal algorithms for two single-dimensional models.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"1 1","pages":"25-25"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82839072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Continuous Reverse Nearest Neighbor Monitoring","authors":"Tian Xia, Donghui Zhang","doi":"10.1109/ICDE.2006.43","DOIUrl":"https://doi.org/10.1109/ICDE.2006.43","url":null,"abstract":"Continuous spatio-temporal queries have recently received increasing attention due to the abundance of location-aware applications. This paper addresses the Continuous Reverse Nearest Neighbor (CRNN) Query. Given a set of objects O and a query set Q, the CRNN query monitors the exact reverse nearest neighbors of each query point, under the model that both the objects and the query points may move unpredictably. Existing methods for the reverse nearest neighbor (RNN) query either are static or assume a priori knowledge of the trajectory information, and thus do not apply. Related recent work on continuous range query and continuous nearest neighbor query relies on the fact that a simple monitoring region exists. Due to the unique features of the RNN problem, it is non-trivial to even define a monitoring region for the CRNN query. This paper defines the monitoring region for the CRNN query, discusses how to perform initial computation, and then focuses on incremental CRNN monitoring upon updates. The monitoring region according to one query point consists of two types of regions. We argue that the two types should be handled separately. In continuous monitoring, two optimization techniques are proposed. Experimental results prove that our proposed approach is both efficient and scalable.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"5 1","pages":"77-77"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76392151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}