International Workshop on Data Warehousing and OLAP最新文献

筛选
英文 中文
Cardinality estimation in ETL processes ETL过程中的基数估计
International Workshop on Data Warehousing and OLAP Pub Date : 2009-11-06 DOI: 10.1145/1651291.1651302
Maik Thiele, Tim Kiefer, Wolfgang Lehner
{"title":"Cardinality estimation in ETL processes","authors":"Maik Thiele, Tim Kiefer, Wolfgang Lehner","doi":"10.1145/1651291.1651302","DOIUrl":"https://doi.org/10.1145/1651291.1651302","url":null,"abstract":"The cardinality estimation in ETL processes is particularly difficult. Aside from the well-known SQL operators, which are also used in ETL processes, there are a variety of operators without exact counterparts in the relational world. In addition to those, we find operators that support very specific data integration aspects. For such operators, there are no well-examined statistic approaches for cardinality estimations. Therefore, we propose a black-box approach and estimate the cardinality using a set of statistic models for each operator. We discuss different model granularities and develop an adaptive cardinality estimation framework for ETL processes. We map the abstract model operators to specific statistic learning approaches (regression, decision trees, support vector machines, etc.) and evaluate our cardinality estimations in an extensive experimental study.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124453872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Consistency-aware evaluation of OLAP queries in replicated data warehouses 复制数据仓库中OLAP查询的一致性感知评估
International Workshop on Data Warehousing and OLAP Pub Date : 2009-11-06 DOI: 10.1145/1651291.1651305
Javier García-García, C. Ordonez
{"title":"Consistency-aware evaluation of OLAP queries in replicated data warehouses","authors":"Javier García-García, C. Ordonez","doi":"10.1145/1651291.1651305","DOIUrl":"https://doi.org/10.1145/1651291.1651305","url":null,"abstract":"OLAP tools for distributed data warehouses generally assume underlying replicated tables are up to date. Unfortunately, maintaining updated replicas is difficult due to the inherent tradeoff between consistency and availability. In this paper, we propose techniques to evaluate OLAP queries in distributed data warehouses assuming a lazy replication model. Considering that it may be admissible to evaluate OLAP queries with slightly outdated replicated tables, our technique first efficiently computes the degree of obsolescence of replicated local tables and when such result is acceptable, given an error threshold, then the query is evaluated locally, avoiding the transmission of large tables over the network. Otherwise, the query can be remotely evaluated less efficiently with the master copy of tables, provided they are stored at a single site. Inconsistency measurement is computed by adapting distributed set reconciliation algorithms to efficiently compute the symmetric difference between the master and replicated tables. Our improved distributed database algorithm has linear communication complexity and cubic time complexity in the size of the symmetric difference, which is expected to be small in a replicated data warehouse. Our technique is independent of the method employed to propagate data warehouse insertions, deletions and updates. We present experiments simulating distributed databases, with different CPU and transmission speeds, showing our method is effective to decide if the query should be evaluated either locally or remotely.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127691472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
View usability and safety for the answering of top-k queries via materialized views 查看通过物化视图回答top-k查询的可用性和安全性
International Workshop on Data Warehousing and OLAP Pub Date : 2009-11-06 DOI: 10.1145/1651291.1651308
Eftychia Baikousi, Panos Vassiliadis
{"title":"View usability and safety for the answering of top-k queries via materialized views","authors":"Eftychia Baikousi, Panos Vassiliadis","doi":"10.1145/1651291.1651308","DOIUrl":"https://doi.org/10.1145/1651291.1651308","url":null,"abstract":"In this paper, we investigate the problem of answering top-k queries via materialized views. We provide theoretical guarantees for the adequacy of a view to answer a top-k query, along with algorithmic techniques to compute the query via a view when this is possible. We explore the problem of answering a query via a combination of more than one view and show that it is impossible to improve our theoretical guarantees for the answering of a query via a combination of views. Finally, we experimentally assess our approach for its effectiveness and efficiency.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125881987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Generating data quality rules and integration into ETL process 生成数据质量规则并集成到ETL流程中
International Workshop on Data Warehousing and OLAP Pub Date : 2009-11-06 DOI: 10.1145/1651291.1651303
J. Rodic, M. Baranović
{"title":"Generating data quality rules and integration into ETL process","authors":"J. Rodic, M. Baranović","doi":"10.1145/1651291.1651303","DOIUrl":"https://doi.org/10.1145/1651291.1651303","url":null,"abstract":"Many data quality projects are integrated into data warehouse projects without enough time allocated for the data quality part, which leads to a need for a quicker data quality process implementation that can be easily adopted as the first stage of data warehouse implementation. We will see that many data quality rules can be implemented in a similar way, and thus generated based on metadata tables that store information about the rules. These generated rules are then used to check data in designated tables and mark erroneous records, or to do certain updates of invalid data. We will also store information about the rules violations in order to provide analysis of such data. This could give a significant insight into our source systems. Entire data quality process will be integrated into ETL process in order to achieve load of data warehouse that is as automated, as correct and as quick as possible. Only small number of records would be left for manual inspection and reprocessing.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125915090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Defining ETL worfklows using BPMN and BPEL 使用BPMN和BPEL定义ETL工作流
International Workshop on Data Warehousing and OLAP Pub Date : 2009-11-06 DOI: 10.1145/1651291.1651299
Z. E. Akkaoui, E. Zimányi
{"title":"Defining ETL worfklows using BPMN and BPEL","authors":"Z. E. Akkaoui, E. Zimányi","doi":"10.1145/1651291.1651299","DOIUrl":"https://doi.org/10.1145/1651291.1651299","url":null,"abstract":"Decisional systems are crucial for enterprise improvement. They allow the consolidation of heterogeneous data from distributed enterprise data stores into strategic indicators. An essential component of this data consolidation is the Extract, Transform, and Load (ETL) process. In the research literature there has been very few work defining conceptual models for ETL processes. At the same time, there are currently many tools that manage such processes. However, each tool uses its own model, which is not necessarily able to communicate with the models of other tools. In this paper, we propose a platform-independent conceptual model of ETL processes based on the Business Process Model Notation (BPMN) standard. We also show how such a conceptual model can be implemented using Business Process Execution Language (BPEL), a standard executable language for specifying interactions with web services.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133346817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 97
A comprehensive approach to data warehouse testing 一个全面的数据仓库测试方法
International Workshop on Data Warehousing and OLAP Pub Date : 2009-11-06 DOI: 10.1145/1651291.1651295
M. Golfarelli, S. Rizzi
{"title":"A comprehensive approach to data warehouse testing","authors":"M. Golfarelli, S. Rizzi","doi":"10.1145/1651291.1651295","DOIUrl":"https://doi.org/10.1145/1651291.1651295","url":null,"abstract":"Testing is an essential part of the design life-cycle of any software product. Nevertheless, while most phases of data warehouse design have received considerable attention in the literature, not much has been said about data warehouse testing. In this paper we introduce a number of data mart-specific testing activities, we classify them in terms of what is tested and how it is tested, and we discuss how they can be framed within a reference design methodology.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130853626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Discovering functional dependencies for multidimensional design 发现多维设计的功能依赖关系
International Workshop on Data Warehousing and OLAP Pub Date : 2009-11-06 DOI: 10.1145/1651291.1651293
Oscar Romero, Diego Calvanese, A. Abelló, M. Rodriguez-Muro
{"title":"Discovering functional dependencies for multidimensional design","authors":"Oscar Romero, Diego Calvanese, A. Abelló, M. Rodriguez-Muro","doi":"10.1145/1651291.1651293","DOIUrl":"https://doi.org/10.1145/1651291.1651293","url":null,"abstract":"Nowadays, it is widely accepted that the data warehouse design task should be largely automated. Furthermore, the data warehouse conceptual schema must be structured according to the multidimensional model and as a consequence, the most common way to automatically look for subjects and dimensions of analysis is by discovering functional dependencies (as dimensions functionally depend on the fact) over the data sources. Most advanced methods for automating the design of the data warehouse carry out this process from relational OLTP systems, assuming that a RDBMS is the most common kind of data source we may find, and taking as starting point a relational schema. In contrast, in our approach we propose to rely instead on a conceptual representation of the domain of interest formalized through a domain ontology expressed in the DL-Lite Description Logic. We propose an algorithm to discover functional dependencies from the domain ontology that exploits the inference capabilities of DL-Lite, thus fully taking into account the semantics of the domain. We also provide an evaluation of our approach in a real-world scenario.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"339 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122543079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Automatic generation of ETL processes from conceptual models 从概念模型自动生成ETL过程
International Workshop on Data Warehousing and OLAP Pub Date : 2009-11-06 DOI: 10.1145/1651291.1651298
Lilia Muñoz, J. Mazón, J. Trujillo
{"title":"Automatic generation of ETL processes from conceptual models","authors":"Lilia Muñoz, J. Mazón, J. Trujillo","doi":"10.1145/1651291.1651298","DOIUrl":"https://doi.org/10.1145/1651291.1651298","url":null,"abstract":"Data warehouses (DW) integrate different data sources in order to give a multidimensional view of them to the decision-maker. To this aim, the ETL (Extraction, Transformation and Load) processes are responsible for extracting data from heterogeneous operational data sources, their transformation (conversion, cleaning, standardization, etc.), and its load in the DW. In recent years, several conceptual modeling approaches have been proposed for designing ETL processes. Although these approaches are very useful for documenting ETL processes and supporting the designer tasks, these proposals fail to give mechanisms to carry out an automatic code generation stage. Such a stage should be required to both avoid fails and save development time in the implementation of complex ETL process. Therefore, in this paper we define an approach for the automatic code generation of ETL processes. To this aim, we align the modeling of ETL processes in DW with MDA (Model Driven Architecture) by formally defining a set of QVT (Query, View, Transformation) transformations.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128442363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
A set of aggregation functions for spatial measures 一组用于空间度量的聚合函数
International Workshop on Data Warehousing and OLAP Pub Date : 2008-10-30 DOI: 10.1145/1458432.1458438
J. Silva, V. Times, A. Salgado, Clenúbio Souza, R. Fidalgo, A. Oliveira
{"title":"A set of aggregation functions for spatial measures","authors":"J. Silva, V. Times, A. Salgado, Clenúbio Souza, R. Fidalgo, A. Oliveira","doi":"10.1145/1458432.1458438","DOIUrl":"https://doi.org/10.1145/1458432.1458438","url":null,"abstract":"A number of studies have been developed in recent years aimed at integrating pertinent concepts and technologies for analytical multidimensional (OLAP) and geographic (GIS) processing environments. This type of integrated environment has been identified as SOLAP (Spatial OLAP). However, due to the fact that these two technologies were conceived with different purposes in mind, the interaction of the two environments is not an easy task and even with so much research being developed, there remain unresolved issues that merit exploration. One such issue refers to aggregation functions for measures. These functions are currently used in the definition of multidimensional and geographic data cubes. The aim of this paper is to present a set of aggregation functions for geographic measures. We also show these functions in practice, by taking into account their use with a SOLAP architecture prototype. This SOLAP prototype is based on a model for Geographic Data Warehouse (GDW), a data cube model and a geographic multidimensional query language.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124479926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Efficient OLAP with UDFs 具有udf的高效OLAP
International Workshop on Data Warehousing and OLAP Pub Date : 2008-10-30 DOI: 10.1145/1458432.1458440
Zhibo Chen, C. Ordonez
{"title":"Efficient OLAP with UDFs","authors":"Zhibo Chen, C. Ordonez","doi":"10.1145/1458432.1458440","DOIUrl":"https://doi.org/10.1145/1458432.1458440","url":null,"abstract":"Since the early 1990s, On-Line Analytical Processing (OLAP) has been a well studied research topic that has focused on implementation outside the database, either with OLAP servers or entirely within the client computers. Our approach involves the computation and storage of OLAP cubes using User-Defined Functions (UDF) with a database management system. UDFs offer users a chance to write their own code that can then called like any other standard SQL function. By generating OLAP cubes within a UDF, we are able to create the entire lattice in main memory. The UDF also allows the user to assert more control over the actual generation process than when using standard OLAP functions such as the CUBE operator. We introduce a data structure that can not only efficiently create an OLAP lattice in main memory, but also be adapted to generate association rule itemsets with minimal change. We experimentally show that the UDF approach is more efficient than SQL using one real dataset and a synthetic dataset. Also, we present several experiments showing that generating association rule itemsets using the UDF approach is comparable to a SQL approach. In this paper, we show that techniques such as OLAP and association rules can be efficiently pushed into the UDF, and has better performance, in most cases, compared to standard SQL functions.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116879903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信