International Workshop on Data Warehousing and OLAP最新文献

筛选
英文 中文
Bijoux: Data Generator for Evaluating ETL Process Quality Bijoux:评估ETL过程质量的数据生成器
International Workshop on Data Warehousing and OLAP Pub Date : 2014-11-07 DOI: 10.1145/2666158.2666183
Emona Nakuçi, V. Theodorou, P. Jovanovic, A. Abelló
{"title":"Bijoux: Data Generator for Evaluating ETL Process Quality","authors":"Emona Nakuçi, V. Theodorou, P. Jovanovic, A. Abelló","doi":"10.1145/2666158.2666183","DOIUrl":"https://doi.org/10.1145/2666158.2666183","url":null,"abstract":"Obtaining the right set of data for evaluating the fulfillment of different quality standards in the extract-transform-load (ETL) process design is rather challenging. First, the real data might be out of reach due to different privacy constraints, while providing a synthetic set of data is known as a labor-intensive task that needs to take various combinations of process parameters into account. Additionally, having a single dataset usually does not represent the evolution of data throughout the complete process lifespan, hence missing the plethora of possible test cases. To facilitate such demanding task, in this paper we propose an automatic data generator (i.e., Bijoux). Starting from a given ETL process model, Bijoux extracts the semantics of data transformations, analyzes the constraints they imply over data, and automatically generates testing datasets. At the same time, it considers different dataset and transformation characteristics (e.g., size, distribution, selectivity, etc.) in order to cover a variety of test scenarios. We report our experimental findings showing the effectiveness and scalability of our approach.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133091192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
An Advanced Data Warehouse for Integrating Large Sets of GPS Data 一种集成大型GPS数据集的高级数据仓库
International Workshop on Data Warehousing and OLAP Pub Date : 2014-11-07 DOI: 10.1145/2666158.2666172
O. Andersen, Benjamin B. Krogh, Christian Thomsen, K. Torp
{"title":"An Advanced Data Warehouse for Integrating Large Sets of GPS Data","authors":"O. Andersen, Benjamin B. Krogh, Christian Thomsen, K. Torp","doi":"10.1145/2666158.2666172","DOIUrl":"https://doi.org/10.1145/2666158.2666172","url":null,"abstract":"GPS data recorded from driving vehicles is available from many sources and is a very good data foundation for answering traffic related queries. However, most approaches so far have not considered combining GPS data from many sources into a single data warehouse. Further, the integration of GPS data with fuel consumption data (from the so-called CAN bus in the vehicles) and weather data has not been done. In this paper, we propose a data warehouse design for handling GPS data, fuel consumption data, and weather data. The design is fully implemented in a running system using the PostgreSQL DBMS. The system has been in production since March 2011 and the main fact table contains today approximately 3.4 billion rows from 16 different data sources. We show that the system can be used for a number of novel traffic related analyses such as relating the fuel consumption of vehicles with the road network and road congestion.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116622660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Big Graph Analytics: The State of the Art and Future Research Agenda 大图表分析:技术现状和未来研究议程
International Workshop on Data Warehousing and OLAP Pub Date : 2014-11-07 DOI: 10.1145/2666158.2668454
A. Cuzzocrea, I. Song
{"title":"Big Graph Analytics: The State of the Art and Future Research Agenda","authors":"A. Cuzzocrea, I. Song","doi":"10.1145/2666158.2668454","DOIUrl":"https://doi.org/10.1145/2666158.2668454","url":null,"abstract":"Analytics over big graphs is becoming a first-class challenge in database research, with fast-growing interest from both the academia and the industrial community. This problem arises in several application scenarios, ranging from social networks to large-scale network systems, from knowledge discovery to cybersecurity, and so forth. Following this major trend, this paper explores actual state-of-the-art results in the area of analytics over big graphs and discusses open research issues and actual trends in such area.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129407081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
GOLAM: A Framework for Analyzing Genomic Data GOLAM:分析基因组数据的框架
International Workshop on Data Warehousing and OLAP Pub Date : 2014-11-07 DOI: 10.1145/2666158.2666175
Lorenzo Baldacci, M. Golfarelli, Simone Graziani, S. Rizzi
{"title":"GOLAM: A Framework for Analyzing Genomic Data","authors":"Lorenzo Baldacci, M. Golfarelli, Simone Graziani, S. Rizzi","doi":"10.1145/2666158.2666175","DOIUrl":"https://doi.org/10.1145/2666158.2666175","url":null,"abstract":"The emerging medical models aim at leveraging on high-throughput genome sequencing technologies to better target drugs to patients' personal profiles so as to increase their effectiveness. However, the huge amount of data made available by these technologies calls for sophisticated and automated analysis techniques. In this direction we present GOLAM, a framework for OLAP analysis and mining of matches between genomic regions extracted from ENCODE, a worldwide-available collection of shared genomic data. The goal of GOLAM is to overcome the current limitations of genome analysis methods, that are normally based on browsing. This is done by partially automating and speeding-up the analysis process on the one hand, by making it more flexible and introducing a multi-resolution view of data on the other. The framework has been partially implemented so far; in this paper we focus on conveying its potential and on describing its functional architecture and the underlying data models.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125471336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Can we analyze big data inside a DBMS? 我们能在DBMS中分析大数据吗?
International Workshop on Data Warehousing and OLAP Pub Date : 2013-10-28 DOI: 10.1145/2513190.2513198
C. Ordonez
{"title":"Can we analyze big data inside a DBMS?","authors":"C. Ordonez","doi":"10.1145/2513190.2513198","DOIUrl":"https://doi.org/10.1145/2513190.2513198","url":null,"abstract":"Relational DBMSs remain the main data management technology, despite the big data analytics and no-SQL waves. On the other hand, for data analytics in a broad sense, there are plenty of non-DBMS tools including statistical languages, matrix packages, generic data mining programs and large-scale parallel systems, being the main technology for big data analytics. Such large-scale systems are mostly based on the Hadoop distributed file system and MapReduce. Thus it would seem a DBMS is not a good technology to analyze big data, going beyond SQL queries, acting just as a reliable and fast data repository. In this survey, we argue that is not the case, explaining important research that has enabled analytics on large databases inside a DBMS. However, we also argue DBMSs cannot compete with parallel systems like MapReduce to analyze web-scale text data. Therefore, each technology will keep influencing each other. We conclude with a proposal of long-term research issues, considering the \"big data analytics\" trend.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130100681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Using REO on ETL conceptual modelling: a first approach 在ETL概念建模中使用REO:第一种方法
International Workshop on Data Warehousing and OLAP Pub Date : 2013-10-28 DOI: 10.1145/2513190.2513202
Bruno Oliveira, O. Belo
{"title":"Using REO on ETL conceptual modelling: a first approach","authors":"Bruno Oliveira, O. Belo","doi":"10.1145/2513190.2513202","DOIUrl":"https://doi.org/10.1145/2513190.2513202","url":null,"abstract":"The formalization of software patterns has proven to be very useful in software developing, improving systems communication, data interchange across platforms, and simplifying the integration of processes and data flows. Populating a data warehouse (ETL) is often a very complex task demanding significant computational resources. It faces many drawbacks during its design and implementation, involving not only large volumes of data that must be processed but also undesirable change of business requirements. All of this leads frequently to reuse significant parts of other ETL implementations, adapting data structures and processes to comply with new requirements. Additionally, we believe that it's necessary a more simply and reliable approach for ETL conceptual modelling covering the \"lack of mature\" of this important part of ETL development. In this paper we explored a new approach to ETL conceptual modelling using the Reo coordination language, trying to evaluate its adequacy and expressiveness on the coordination of ETL tasks. A pattern-based approach was designed to map typical operations used in real world ETL scenarios from an initial Reo specification. For demonstration purposes, we present and discuss as two case studies, a slowly changing dimension and a surrogated key pipelining processes.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127808087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
INDREX: in-database distributional relation extraction INDREX:数据库内分布关系提取
International Workshop on Data Warehousing and OLAP Pub Date : 2013-10-28 DOI: 10.1145/2513190.2513196
T. Kilias, Alexander Löser, Periklis Andritsos
{"title":"INDREX: in-database distributional relation extraction","authors":"T. Kilias, Alexander Löser, Periklis Andritsos","doi":"10.1145/2513190.2513196","DOIUrl":"https://doi.org/10.1145/2513190.2513196","url":null,"abstract":"Relation extraction transforms the textual representation of a relationship into the relational model of a data warehouse. Early systems, such as SystemT by IBM or the open source system GATE solve this task with handcrafted rule sets that the system executes document-by-document. Thereby the user must execute a highly interactive and iterative process of reading a document, of expressing rules, of testing these rules on the next document and of refining rules. Until now, these systems do neither leverage the full potential of built-in declarative query languages nor the indexing and query optimization techniques of a modern RDBMS that would enable a user interactive rule refinement across documents and on the entire corpus. We propose the INDREX system that enables a user for the first time to describe corpus-wide extraction tasks in a declarative language and permits the user to run interactive rule refinement queries. For enabling this powerful functionality we extend a standard PostgreSQL with a set of white-box user-defined functions that enable corpus-wide transformations from sentences into relationships. We store the text corpus and rules in the same RDBMS that already holds domain specific structured data. As a result, (1) the user can leverage this data to further adapt rules to the target domain, (2) the user does not need an additional system for rule extraction and (3) the INDREX system can leverage the full power of built-in indexing and query optimization techniques of the underlaying RDBMS. In a preliminary study we report on the feasibility of this disruptive approach and show multiple queries in INDREX on the Reuters Corpus, Volume 1.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"215 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116823655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Meta-stars: multidimensional modeling for social business intelligence 元之星:社会商业智能的多维建模
International Workshop on Data Warehousing and OLAP Pub Date : 2013-10-28 DOI: 10.1145/2513190.2513195
E. Gallinucci, M. Golfarelli, S. Rizzi
{"title":"Meta-stars: multidimensional modeling for social business intelligence","authors":"E. Gallinucci, M. Golfarelli, S. Rizzi","doi":"10.1145/2513190.2513195","DOIUrl":"https://doi.org/10.1145/2513190.2513195","url":null,"abstract":"Social business intelligence is the discipline of combining corporate data with user-generated content (UGC) to let decision-makers improve their business based on the trends perceived from the environment. A key role in the analysis of textual UGC is played by topics, meant as specific concepts of interest within a subject area. To enable aggregations of topics at different levels, a topic hierarchy is to be defined. Some attempts have been made to address some of the peculiarities of topic hierarchies, but no comprehensive solution has been found so far. The approach we propose to model topic hierarchies in ROLAP systems is called meta-stars. Its basic idea is to use meta-modeling coupled with navigation tables and with traditional dimension tables: navigation tables support hierarchy instances with different lengths and with non-leaf facts, and allow different roll-up semantics to be explicitly annotated; meta-modeling enables hierarchy heterogeneity and dynamics to be accommodated; dimension tables are easily integrated with standard business hierarchies. After outlining a reference architecture for social business intelligence and describing the meta-star approach, we discuss its effectiveness and efficiency by showing its querying expressiveness and by presenting some experimental results for query performances.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"2011 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131871642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
CXT-cube: contextual text cube model and aggregation operator for text OLAP CXT-cube:文本OLAP的上下文文本多维数据集模型和聚合操作符
International Workshop on Data Warehousing and OLAP Pub Date : 2013-10-28 DOI: 10.1145/2513190.2513201
Lamia Oukid, Ounas Asfari, F. Bentayeb, N. Benblidia, Omar Boussaïd
{"title":"CXT-cube: contextual text cube model and aggregation operator for text OLAP","authors":"Lamia Oukid, Ounas Asfari, F. Bentayeb, N. Benblidia, Omar Boussaïd","doi":"10.1145/2513190.2513201","DOIUrl":"https://doi.org/10.1145/2513190.2513201","url":null,"abstract":"Traditional data warehousing technologies and On-Line Analytical Processing (OLAP) are unable to analyze textual data. Moreover, as OLAP queries of a decision-maker are generally related to a context, contextual information must be taken into account during the exploitation of data warehouses. Thus, we propose a contextual text cube model denoted CXT-Cube which considers several contextual factors during the OLAP analysis in order to better consider the contextual information associated with textual data. CXT-Cube is characterized by several contextual dimensions, each one related to a contextual factor. In addition, we extend our aggregation OLAP operator for textual data ORank (OLAP-Rank) to consider all the contextual factors defined in our CXT-Cube model. To validate our model, we perform an experimental study and the preliminary results show the importance of our approach for integrating textual data into a data warehouse and improving the decision-making.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125294573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Lazy data structure maintenance for main-memory analytics over sliding windows 通过滑动窗口进行主存分析的延迟数据结构维护
International Workshop on Data Warehousing and OLAP Pub Date : 2013-10-28 DOI: 10.1145/2513190.2513203
Chang Ge, Lukasz Golab
{"title":"Lazy data structure maintenance for main-memory analytics over sliding windows","authors":"Chang Ge, Lukasz Golab","doi":"10.1145/2513190.2513203","DOIUrl":"https://doi.org/10.1145/2513190.2513203","url":null,"abstract":"We address the problem of maintaining data structures used by memory-resident data warehouses that store sliding windows. We propose a framework that eagerly expires data from the sliding window to save space and/or satisfy data retention policies, but lazily maintains the associated data structures to reduce maintenance overhead. Using a dictionary as an example, we show that our framework enables maintenance algorithms that outperform existing approaches in terms of space overhead, maintenance overhead, and dictionary lookup overhead during query execution.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129961469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信