Michael Daum, F. Lauterwald, P. Baumgärtel, Niko Pollner, K. Meyer-Wegener
{"title":"Black-box determination of cost models' parameters for federated stream-processing systems","authors":"Michael Daum, F. Lauterwald, P. Baumgärtel, Niko Pollner, K. Meyer-Wegener","doi":"10.1145/2076623.2076654","DOIUrl":"https://doi.org/10.1145/2076623.2076654","url":null,"abstract":"For distribution and deployment of queries in distributed stream-processing environments, it is vital to estimate the expected costs in advance. Having heterogeneous Stream-Processing Systems (SPSs) running on various hosts, the parameters of a cost model for an operator must be determined by measurements for each relevant combination of an SPS and hardware.\u0000 This paper presents a black-box method that determines the parameters of appropriate cost models that regard system-specific behavior. For some SPSs, there might not be any appropriate cost model available due to the lack of internal knowledge. If no cost model is available for any reason, we provide and apply a non-parametric model.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"26 1","pages":"226-232"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84036106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martin V. Jørgensen, René Bech Rasmussen, Simonas Šaltenis, Carsten Schjønning
{"title":"FB-tree: a B+-tree for flash-based SSDs","authors":"Martin V. Jørgensen, René Bech Rasmussen, Simonas Šaltenis, Carsten Schjønning","doi":"10.1145/2076623.2076629","DOIUrl":"https://doi.org/10.1145/2076623.2076629","url":null,"abstract":"Due to their many advantages, flash-based SSDs (Solid-State Drives) have become a mainstream alternative to magnetic disks for database servers. Nevertheless, database systems, designed and optimized for magnetic disks, still do not fully exploit all the benefits of the new technology.\u0000 We propose the FB-tree: a combination of an adapted B+-tree, a storage manager, and a buffer manager, all optimized for modern SSDs. Together the techniques enable writing to SSDs in relatively large blocks, thus achieving greater overall throughput. This is achieved by the out-of-place writing, whereby every time a modified index node is written, it is written to a new address, clustered with some other nodes that are written together. While this constantly frees index nodes, the FB-tree does not introduce any garbage-collection overhead, instead relying on naturally occurring free-space segments of sufficient size. As a consequence, the FB-tree outperforms a regular B+-tree in all scenarios tested. For instance, the throughput of a random workload of 75% updates increases by a factor of three using only two times the space of the B+-tree.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"68 1","pages":"34-42"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91349463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using an inference mechanism for helping the data integration","authors":"V. Pequeno, J. Pires","doi":"10.1145/2076623.2076660","DOIUrl":"https://doi.org/10.1145/2076623.2076660","url":null,"abstract":"Sharing and integrating information across multiple autonomous and heterogeneous data sources has emerged as a strategic requirement in modern business. We deal with this problem by proposing a declarative approach based on the creation of a reference model and perspective schemata. The former serves as a common semantic meta-model, while the latter defines correspondence between schemata. Furthermore, using the proposed architecture, we developed an inference mechanism which allows the (semi-) automatic derivation of new mappings between schemata from previous ones. The aim of this paper is to present the proposed inference mechanism.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"22 1","pages":"251-253"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78161131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A predictable storage model for scalable parallel DW","authors":"J. Costa, J. Cecílio, P. Martins, P. Furtado","doi":"10.1145/2076623.2076628","DOIUrl":"https://doi.org/10.1145/2076623.2076628","url":null,"abstract":"Star schema model, has been widely used as the facto DW storage organization on RDBMS. Business measures are stored in a central fact table along with a set of foreign keys referencing dimension tables. While this storage organization offers a good trade-off between storage size and performance for a single node, it doesn't scale in a predictable manner in shared-nothing parallel architectures. Although fact tables can be linearly partitioned among nodes, the same doesn't apply to dimensions, which unbalances (increases) the dimensions/fact_table size ratio, and consequently introduces limits to the number of parallel nodes. In this paper we propose and evaluate a parallel DW storage model, that overcomes these limitations and deliver optimal speed-up and scale-up capabilities with top efficiency. We use the TPC-H benchmark to evaluate the scalability and efficiency of the proposed model.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"29 1","pages":"26-33"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76558708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A family of graph-theory-driven algorithms for managing complex probabilistic graph data efficiently","authors":"A. Cuzzocrea, Paolo Serafino","doi":"10.1145/2076623.2076657","DOIUrl":"https://doi.org/10.1145/2076623.2076657","url":null,"abstract":"Traditionally, a great deal of attention has been devoted to the problem of effectively modeling and querying probabilistic graph data. State-of-the-art proposals are not prone to deal with complex probabilistic data, as they essentially introduce simple data models (e.g., based on confidence intervals) and straightforward query methodologies (e.g., based on the reachability property). According to our vision, these proposals need to be extended towards achieving the definition of innovative models and algorithms capable of dealing with the hardness of novel requirements posed by managing complex probabilistic graph data efficiently. Inspired by this main motivation, in this paper we propose and experimentally assess an innovative family of graph-theory-driven algorithms for managing complex probabilistic graph data, whose main double-fold goal consists in enhancing the expressive power of the underlying probabilistic graph data model and the expressive power of graph queries.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"13 1","pages":"240-242"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82002829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Aggregates and priorities in P2P data management systems","authors":"Luciano Caroprese, E. Zumpano","doi":"10.1145/2076623.2076625","DOIUrl":"https://doi.org/10.1145/2076623.2076625","url":null,"abstract":"This paper investigates the data exchange problem among distributed independent sources. It is based on previous works of the authors [11, 12, 14] in which a declarative semantics for P2P systems has been presented and a mechanism to set different degrees of reliability for neighbor peers has been provided. The basic semantics for P2P systems defines the concept of Maximal Weak Models (in [11, 12, 14] these models have been called Preferred Weak Models. In this paper we rename them and use the term Preferred for the subclass of Weak Model defined here) that represent scenarios in which maximal sets of facts not violating integrity constraints are imported into the peers [11, 12]. Previous priority mechanism defined in [14] is rigid in the sense that the preference between conflicting sets of atoms that a peer can import only depends on the priorities associated to the source peers at design time. In this paper we present a different framework that allows to select among different scenarios looking at the properties of data provided by the peers. The framework presented here allows to model concepts like \"in the case of conflicting information, it is preferable to import data from the neighbor peer that can provide the maximum number of tuples\" or \"in the case of conflicting information, it is preferable to import data from the neighbor peer such that the sum of the values of an attribute is minimum\" without selecting a-priori preferred peers. To enforce this preference mechanism we enrich the previous P2P framework with aggregate functions and present significant examples showing the flexibility of the new framework.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"54 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90248729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online outlier detection for data streams","authors":"Md. Shiblee Sadik, L. Gruenwald","doi":"10.1145/2076623.2076635","DOIUrl":"https://doi.org/10.1145/2076623.2076635","url":null,"abstract":"Outlier detection is a well established area of statistics but most of the existing outlier detection techniques are designed for applications where the entire dataset is available for random access. A typical outlier detection technique constructs a standard data distribution or model and identifies the deviated data points from the model as outliers. Evidently these techniques are not suitable for online data streams where the entire dataset, due to its unbounded volume, is not available for random access. Moreover, the data distribution in data streams change over time which challenges the existing outlier detection techniques that assume a constant standard data distribution for the entire dataset. In addition, data streams are characterized by uncertainty which imposes further complexity. In this paper we propose an adaptive, online outlier detection technique addressing the aforementioned characteristics of data streams, called Adaptive Outlier Detection for Data Streams (A-ODDS), which identifies outliers with respect to all the received data points as well as temporally close data points. The temporally close data points are selected based on time and change of data distribution. We also present an efficient and online implementation of the technique and a performance study showing the superiority of A-ODDS over existing techniques in terms of accuracy and execution time on a real-life dataset collected from meteorological applications.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"47 1","pages":"88-96"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91212880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Boosting tuple propagation in multi-relational classification","authors":"Lucantonio Ghionna, G. Greco","doi":"10.1145/2076623.2076637","DOIUrl":"https://doi.org/10.1145/2076623.2076637","url":null,"abstract":"Multi-relational classification is a mining method aiming at building classifiers for the tuples in some target relation based on its own data as well as on the data possibly dispersed over other non-target relations, by exploiting the relationships among them formalized via foreign key constraints. While improving on the efficacy of the resulting classifiers, propagating data via the foreign key constraints deteriorates the scalability of the underlying algorithm. In the paper, various techniques are discussed to efficiently implement this propagation task, and hence to boost performances of current multi-relational classification algorithms. These techniques are based on suitable adaptations of state-of-the-art query optimization methods, and are conceived to be coupled with database management systems. A system prototype integrating all the techniques is illustrated, and results of experimental activity conducted on top of it are eventually discussed.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"1 1","pages":"106-114"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79916460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the expressiveness of generalization rules for XPath query relaxation","authors":"Bettina Fazzinga, S. Flesca, F. Furfaro","doi":"10.1145/1866480.1866504","DOIUrl":"https://doi.org/10.1145/1866480.1866504","url":null,"abstract":"The problem of defining suitable rewriting mechanisms for XML query languages to support approximate query answering has received a great deal of attention in the last few years, owing to its practical impact in several scenarios. For instance, in the typical scenario of distributed XML data without a shared data scheme, accomplishing the extraction of the information of interest often requires queries to be rewritten into relaxed ones, in order to adapt them to the schemes adopted in the different sources.\u0000 In this paper, rewriting systems for a wide fragment of XPath (which is the core of several languages for manipulating XML data) are investigated, and a general form of rewriting rules (namely, generalization rules) is considered, which subsumes the forms adopted in the most well-known rewriting systems. Specifically, the expressiveness of rewriting systems based on this form of rules is characterized: on the one hand, it is shown that rewriting systems based on generalization rules are incomplete w.r.t. containment (thus, traditional rewriting mechanisms do not suffice to rewrite a query into any more general one). On the other hand, it is also shown that the expressiveness of state-of-the-art rewriting systems can be improved by employing rewriting primitives as simple as those traditionally used, which enable any query to be relaxed into every more general one related to it via homomorphism.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"17 1","pages":"157-168"},"PeriodicalIF":0.0,"publicationDate":"2010-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80976172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An integrative approach to query optimization in native XML database management systems","authors":"A. Weiner, T. Härder","doi":"10.1145/1866480.1866491","DOIUrl":"https://doi.org/10.1145/1866480.1866491","url":null,"abstract":"Even though an effective cost-based query optimizer is of utmost importance for the efficient evaluation of XQuery expressions in native XML database systems, such a component is currently out of sight, because former approaches do not pay attention to the latest advances in the area of physical operators (e. g., Holistic Twig Joins and advanced indexes) or just focus only on some of them.\u0000 To support the development of native XML query optimizers, we introduce an extensible cost-based optimization framework that integrates the cutting-edge XML query evaluation operators into a single system. Using the well-known plan generation techniques from the relational world and a novel set of plan equivalences---which allows for the generation of alternative query plans consisting of Structural Joins, Holistic Twig Joins, and numerous indexes (especially path indexes and content-and-structure indexes)---our optimizer can now benefit from the knowledge on native XML query evaluation to speed-up query execution significantly.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"110 1","pages":"64-74"},"PeriodicalIF":0.0,"publicationDate":"2010-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72864377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}