{"title":"AutoGlobe: An Automatic Administration Concept for Service-Oriented Database Applications","authors":"S. Seltzsam, D. Gmach, Stefan Krompass, A. Kemper","doi":"10.1109/ICDE.2006.26","DOIUrl":"https://doi.org/10.1109/ICDE.2006.26","url":null,"abstract":"Future database application systems will be designed as Service Oriented Architectures (SOAs) like SAP’s NetWeaver instead of monolithic software systems such as SAP’s R/3. The decomposition in finer-grained services allows the usage of hardware clusters and a flexible serviceto- server allocation but also increases the complexity of administration. Thus, new administration techniques like our self-organizing infrastructure that we developed in cooperation with the SAP Adaptive Computing Infrastructure (ACI) group are necessary. For our purpose the available hardware is virtualized, pooled, and monitored. A fuzzy logic based controller module supervises all services running on the hardware platform and remedies exceptional situations automatically. With this self-organizing infrastructure we reduce the necessary hardware and administration overhead and, thus, lower the total cost of ownership (TCO). We used our prototype implementation, called Auto- Globe, for SAP-internal tests and we performed comprehensive simulation studies to demonstrate the effectiveness of our proposed concept.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"90 1","pages":"90-90"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76730817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Esther Ryvkina, Anurag Maskey, Mitch Cherniack, S. Zdonik
{"title":"Revision Processing in a Stream Processing Engine: A High-Level Design","authors":"Esther Ryvkina, Anurag Maskey, Mitch Cherniack, S. Zdonik","doi":"10.1109/ICDE.2006.130","DOIUrl":"https://doi.org/10.1109/ICDE.2006.130","url":null,"abstract":"Data stream processing systems have become ubiquitous in academic [1, 2, 5, 6] and commercial [11] sectors, with application areas that include financial services, network traffic analysis, battlefield monitoring and traffic control [3]. The append-only model of streams implies that input data is immutable and therefore always correct. But in practice, streaming data sources often contend with noise (e.g., embedded sensors) or data entry errors (e.g., financial data feeds) resulting in erroneous inputs and therefore, erroneous query results. Many data stream sources (e.g., commercial ticker feeds) issue \"revision tuples\" (revisions) that amend previously issued tuples (e.g. erroneous share prices). Ideally, any stream processing engine should process revision inputs by generating revision outputs that correct previous query results. We know of no stream processing system that presently has this capability.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"5 1","pages":"141-141"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73164823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ashwin Machanavajjhala, J. Gehrke, Daniel Kifer, Muthuramakrishnan Venkitasubramaniam
{"title":"L-diversity: privacy beyond k-anonymity","authors":"Ashwin Machanavajjhala, J. Gehrke, Daniel Kifer, Muthuramakrishnan Venkitasubramaniam","doi":"10.1145/1217299.1217302","DOIUrl":"https://doi.org/10.1145/1217299.1217302","url":null,"abstract":"Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called kappa-anonymity has gained popularity. In a kappa-anonymized dataset, each record is indistinguishable from at least k—1 other records with respect to certain \"identifying\" attributes. In this paper we show with two simple attacks that a kappa-anonymized dataset has some subtle, but severe privacy problems. First, we show that an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. Second, attackers often have background knowledge, and we show that kappa-anonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks and we propose a novel and powerful privacy definition called ell-diversity. In addition to building a formal foundation for ell-diversity, we show in an experimental evaluation that ell-diversity is practical and can be implemented efficiently.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"44 1","pages":"24-24"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73750049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining Dense Periodic Patterns in Time Series Data","authors":"Chang Sheng, W. Hsu, M. Lee","doi":"10.1109/ICDE.2006.97","DOIUrl":"https://doi.org/10.1109/ICDE.2006.97","url":null,"abstract":"Existing techniques to mine periodic patterns in time series data are focused on discovering full-cycle periodic patterns from an entire time series. However, many useful partial periodic patterns are hidden in long and complex time series data. In this paper, we aim to discover the partial periodicity in local segments of the time series data. We introduce the notion of character density to partition the time series into variable-length fragments and to determine the lower bound of each character’s period. We propose a novel algorithm, called DPMiner, to find the dense periodic patterns in time series data. Experimental results on both synthetic and real-life datasets demonstrate that the proposed algorithm is effective and efficient to reveal interesting dense periodic patterns.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"4 1","pages":"115-115"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84232954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"XPlainer: An XPath Debugging Framework","authors":"M. Consens, John W. S. Liu, Bill O'Farrell","doi":"10.1109/ICDE.2006.177","DOIUrl":"https://doi.org/10.1109/ICDE.2006.177","url":null,"abstract":"XML is an important practical paradigm in information technology and has a broad range of applications. How to access and retrieve the XML data is crucial to these applications. There are two standard ways for accessing and manipulating XML data, the Simple API for XML (SAX) and the Document Object Model (DOM). However, when an application needs to traverse through XML data, it is not easy to retrieve the required data with these two standard ways. XML data is impossible to be retrieved back and forth by SAX, and the graph-oriented DOM notation is not easy to work with. With such limitation, the W3C supervises the development of three important languages: XPath [3], XQuery and XSLT for exploring and querying XML. Among these three languages, XPath is the key and cornerstone language for the other two. XPath defines expressions for traversing an XML document and specifies the set of nodes (XPath 1.0) or the sequence of nodes (XPath 2.0) in the XML document.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"78 1","pages":"170-170"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84032335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Probabilistic Message Passing in Peer Data Management Systems","authors":"P. Cudré-Mauroux, K. Aberer, Andras Feher","doi":"10.1109/ICDE.2006.118","DOIUrl":"https://doi.org/10.1109/ICDE.2006.118","url":null,"abstract":"Until recently, most data integration techniques involved central components, e.g., global schemas, to enable transparent access to heterogeneous databases. Today, however, with the democratization of tools facilitating knowledge elicitation in machine-processable formats, one cannot rely on global, centralized schemas anymore as knowledge creation and consumption are getting more and more dynamic and decentralized. Peer Data Management Systems (PDMS) provide an answer to this problem by eliminating the central semantic component and considering instead compositions of local, pair-wise mappings to propagate queries from one database to the others. PDMS approaches proposed so far make the implicit assumption that all mappings used in this way are correct. This obviously cannot be taken as granted in typical PDMS settings where mappings can be created (semi) automatically by independent parties. In this work, we propose a totally decentralized, efficient message passing scheme to automatically detect erroneous mappings in PDMS. Our scheme is based on a probabilistic model where we take advantage of transitive closures of mapping operations to confront local belief on the correctness of a mapping against evidences gathered around the network. We show that our scheme can be efficiently embedded in any PDMS and provide a preliminary evaluation of our techniques on sets of both automatically-generated and real-world schemas.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"34 1","pages":"41-41"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80457948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"End-biased Samples for Join Cardinality Estimation","authors":"Cristian Estan, J. Naughton","doi":"10.1109/ICDE.2006.61","DOIUrl":"https://doi.org/10.1109/ICDE.2006.61","url":null,"abstract":"We present a new technique for using samples to estimate join cardinalities. This technique, which we term \"end-biased samples,\" is inspired by recent work in network traffic measurement. It improves on random samples by using coordinated pseudo-random samples and retaining the sampled values in proportion to their frequency. We show that end-biased samples always provide more accurate estimates than random samples with the same sample size. The comparison with histograms is more interesting ― while end-biased histograms are somewhat better than end-biased samples for uncorrelated data sets, end-biased samples dominate by a large margin when the data is correlated. Finally, we compare end-biased samples to the recently proposed \"skimmed sketches\" and show that neither dominates the other, that each has different and compelling strengths and weaknesses. These results suggest that endbiased samples may be a useful addition to the repertoire of techniques used for data summarization.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"1 1","pages":"20-20"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82857464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enabling Query Processing on Spatial Networks","authors":"Jagan Sankaranarayanan, H. Alborzi, H. Samet","doi":"10.1109/ICDE.2006.60","DOIUrl":"https://doi.org/10.1109/ICDE.2006.60","url":null,"abstract":"A system that enables real time query processing on large spatial networks is demonstrated. The system provides functionality for processing a wide range of spatial queries such as nearest neighbor searches and spatial joins on spatial networks of sufficiently large sizes.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"31 1","pages":"163-163"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88096475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Beckmann, A. Halverson, R. Krishnamurthy, J. Naughton
{"title":"Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format","authors":"J. Beckmann, A. Halverson, R. Krishnamurthy, J. Naughton","doi":"10.1109/ICDE.2006.67","DOIUrl":"https://doi.org/10.1109/ICDE.2006.67","url":null,"abstract":"\"Sparse\" data, in which relations have many attributes that are null for most tuples, presents a challenge for relational database management systems. If one uses the normal \"horizontal\" schema to store such data sets in any of the three leading commercial RDBMS, the result is tables that occupy vast amounts of storage, most of which is devoted to nulls. If one attempts to avoid this storage blowup by using a \"vertical\" schema, the storage utilization is indeed better, but query performance is orders of magnitude slower for certain classes of queries. In this paper, we argue that the proper way to handle sparse data is not to use a vertical schema, but rather to extend the RDBMS tuple storage format to allow the representation of sparse attributes as interpreted fields. The addition of interpreted storage allows for efficient and transparent querying of sparse data, uniform access to all attributes, and schema scalability. We show, through an implementation in PostgreSQL, that the interpreted storage approach dominates in query efficiency and ease-of-use over the current horizontal storage and vertical schema approaches over a wide range of queries and sparse data sets.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"102 1","pages":"58-58"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76102059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Ayad, J. Naughton, Stephen J. Wright, U. Srivastava
{"title":"Approximating StreamingWindow Joins Under CPU Limitations","authors":"A. Ayad, J. Naughton, Stephen J. Wright, U. Srivastava","doi":"10.1109/ICDE.2006.24","DOIUrl":"https://doi.org/10.1109/ICDE.2006.24","url":null,"abstract":"Data streaming systems face the possibility of having to shed load in the case of CPU or memory resource limitations. We study the CPU limited scenario in detail. First, we propose a new model for the CPU cost. Then we formally state the problem of shedding load for the goal of obtaining the maximum possible subset of the complete answer, and propose an online strategy for semantic load shedding. Moving on to random load shedding, we discuss random load shedding strategies that decouple the window maintenance and tuple production operations of the symmetric hash join, and prove that one of them — Probe-No-Insert — always dominates the previously proposed coin flipping strategy.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"1 1","pages":"142-142"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79528380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}