International Workshop on Data Warehousing and OLAP最新文献

筛选
英文 中文
Data warehousing and OLAP over big data: current challenges and future research directions 大数据下的数据仓库与OLAP:当前挑战与未来研究方向
International Workshop on Data Warehousing and OLAP Pub Date : 2013-10-28 DOI: 10.1145/2513190.2517828
A. Cuzzocrea, Ladjel Bellatreche, I. Song
{"title":"Data warehousing and OLAP over big data: current challenges and future research directions","authors":"A. Cuzzocrea, Ladjel Bellatreche, I. Song","doi":"10.1145/2513190.2517828","DOIUrl":"https://doi.org/10.1145/2513190.2517828","url":null,"abstract":"In this paper, we highlight open problems and actual research trends in the field of Data Warehousing and OLAP over Big Data, an emerging term in Data Warehousing and OLAP research. We also derive several novel research directions arising in this field, and put emphasis on possible contributions to be achieved by future research efforts.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122385233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 125
Slowly changing measures 缓慢变化的措施
International Workshop on Data Warehousing and OLAP Pub Date : 2013-10-28 DOI: 10.1145/2513190.2513194
M. Goller, Stefan Berger
{"title":"Slowly changing measures","authors":"M. Goller, Stefan Berger","doi":"10.1145/2513190.2513194","DOIUrl":"https://doi.org/10.1145/2513190.2513194","url":null,"abstract":"In data warehousing, measures such as net sales, customer reliability scores, churn likelihood, or sentiment indices are transactional data scored from the business events by measurement functions. Dimensions model subject-oriented data used as analysis perspectives when interpreting the measures. While measures and measurement functions are traditionally regarded as stable within the Data Warehouse (DW) schema, the well-known design concept of slowly changing dimensions (SCDs) supports evolving dimension data. SCDs preserve a history of evolving dimension instances, and thus allow tracing and reconstructing the correct dimensional context of all measures in the cube over time.\u0000 Measures are also subject to change if DW designers (i) update the underlying measurement function as a whole, or (ii) fine-tune the function parameters. In both scenarios, the changes must be obvious to the business analysts. Otherwise the changed semantics leads to incomparable measure values, and thus unsound and worthless analysis results.\u0000 To handle measure evolution properly, this paper proposes Slowly Changing Measures (SCMs) as an additional DW design concept that prevents incomparable measures. Its core idea is to avoid excessive schema updates despite regular changes to measure semantics by a precautious design, handling the changes mostly at the instance level. The paper introduces four SCM types, each with different strengths regarding various practical requirements, including an optional historical track of measure definitions to enable cross-version queries. The approach considers stable business events under normal loading delays of measurements, and the standard temporality model based on the inherent occurrence time of facts. Furthermore, the SCMs concept universally applies to both, flow and stock measure semantics.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121208093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
CineCubes: cubes as movie stars with little effort 立方体:不费什么力气就能变成电影明星
International Workshop on Data Warehousing and OLAP Pub Date : 2013-10-28 DOI: 10.1145/2513190.2513191
Dimitrios Gkesoulis, Panos Vassiliadis
{"title":"CineCubes: cubes as movie stars with little effort","authors":"Dimitrios Gkesoulis, Panos Vassiliadis","doi":"10.1145/2513190.2513191","DOIUrl":"https://doi.org/10.1145/2513190.2513191","url":null,"abstract":"In this paper we investigate how we can exploit the existence of a star schema in order to answer user OLAP queries with CineCube movies. Our method, implemented in an actual system, includes the following steps. The user submits a query over an underlying star schema. Taking this query as input, the system comes up with a set of queries complementing the information content of the original query, and executes them. Then, the system visualizes the query results and accompanies this presentation with a text commenting on the result highlights. Moreover, via a text-to-speech conversion the system automatically produces audio for the constructed text. Each combination of visualization, text and audio practically constitutes a cube movie, which is wrapped as a PowerPoint presentation and returned to the user.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114165346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Revisiting aggregation techniques for big data 重新审视大数据聚合技术
International Workshop on Data Warehousing and OLAP Pub Date : 2013-10-28 DOI: 10.1145/2513190.2517827
V. Tsotras
{"title":"Revisiting aggregation techniques for big data","authors":"V. Tsotras","doi":"10.1145/2513190.2517827","DOIUrl":"https://doi.org/10.1145/2513190.2517827","url":null,"abstract":"In this talk we first present an introduction to AsterixDB [1], a parallel, semistructured platform to ingest, store, index, query, analyze, and publish \"big data\" (http://asterixdb.ics.uci.edu) and the various challenges we addressed while building it. AsterixDB combines ideas from semistructured data management, parallel database systems, and first-generation data-intensive computing platforms (MapReduce and Hadoop). The full AsterixDB software stack provides support for big data applications from the storage and processing engine (Hyracks [2] available at: http://hyracks.googlecode.com), to the exible query optimization layer (Algebricks), to the interfaces for user-level interaction (AQL, HiveQL, Pregelix, etc.) Hyracks is a partitioned-parallel engine for data intensive computing jobs in the form of DAGs. Algebricks is a model-agnostic, algebraic layer for compiling and optimizing parallel queries to be processed by Hyracks. Queries for AsterixDB can be expressed by either popular higher-level data analysis languages like Pig, Hive or Jaql, or by its native query language (AQL) and data model (ADM) with support for semi-structured information and fuzzy data.\u0000 Fundamental data processing operations, like joins and aggregations, are natively supported in AsterixDB. The second part of the talk focuses on our experiences while designing efficient local (per node) aggregation algorithms for AsterixDB. In particular, there are two challenges for local aggregations in a big data system: first, if the aggregation is group-based (like the \"group-by\" in SQL), the aggregation result may not fit in main memory; second, in order to allow multiple operations being processed simultaneously, an aggregation operation should work within a strict memory budget provided by the platform. Despite its importance and challenges, the design and evaluation of local aggregation algorithms has not received the same level of attention that other basic operators, such as joins, have received in the literature. Facing a lack of \"off the shelf\" local aggregation algorithms for big data, we present low-level implementation details for engineering the aggregation operator, utilizing (i) sort-based, (ii) hash-based, and (iii) sort-hash-hybrid approaches. We present six algorithms all of which work within a strictly bounded memory budget, and can easily adapt between in-memory and external processing. Among them, two are novel and four are based on extending existing join algorithms.\u0000 We deployed all algorithms as operators in the Hyracks platform and evaluated their performance through extensive experimentation. Our experiments cover many different performance factors, including input cardinality, memory, data distribution, and hash table structure. Our study guided our selection of the local aggregation algorithms supported in the recent release of AsterixDB, namely: the hybrid-hash. Pre-Partitioning algorithm for its tolerance on the estimation of the input grouping key car","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126241285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Extended dimensions for cleaning and querying inconsistent data warehouses 用于清理和查询不一致数据仓库的扩展维度
International Workshop on Data Warehousing and OLAP Pub Date : 2013-10-28 DOI: 10.1145/2513190.2513193
J. Ramírez, Loreto Bravo, Mónica Caniupán Marileo
{"title":"Extended dimensions for cleaning and querying inconsistent data warehouses","authors":"J. Ramírez, Loreto Bravo, Mónica Caniupán Marileo","doi":"10.1145/2513190.2513193","DOIUrl":"https://doi.org/10.1145/2513190.2513193","url":null,"abstract":"A dimension in a data warehouse (DW) is an abstract concept that groups data that share a common semantic meaning. The dimensions are modeled using a hierarchical schema of categories. A dimension is called strict if every element of each category has exactly one ancestor in each parent category, and covering if each element of a category has an ancestor in each parent category. If a dimension is strict and covering we can use pre-computed results at lower levels to answer queries at higher levels. This capability of computing summaries is vital for efficiency purposes. Nevertheless, when dimensions are not strict/covering it is important to know their strictness and covering constraints to keep the capability of obtaining correct summarizations. Real world dimensions might fail to satisfy these constraints, and, in these cases, it is important to find ways to fix the dimensions (correct them) or find ways to get correct answers to queries posed on inconsistent dimensions. A minimal repair is a new dimension that satisfies the strictness and covering constraints, and that is obtained from the original dimension through a minimum number of changes. The set of minimal repairs can be used as a tool to compute answers to aggregate queries in the presence of inconsistencies. However, computing all of them is NP-hard. In this paper, instead of trying to find all possible minimal repairs, we define a single compatible repair that is consistent with respect to both strictness and covering constraints, is close to the inconsistent dimension, can be computed efficiently and can be used to compute approximate answers to aggregate queries. In order to define the compatible repair we defined the notion of extended dimension that supports sets of elements in categories.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123299807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Optimizing OLAP cube processing on solid state drives 优化固态驱动器上的OLAP多维数据集处理
International Workshop on Data Warehousing and OLAP Pub Date : 2013-10-28 DOI: 10.1145/2513190.2513197
Zhibo Chen, C. Ordonez
{"title":"Optimizing OLAP cube processing on solid state drives","authors":"Zhibo Chen, C. Ordonez","doi":"10.1145/2513190.2513197","DOIUrl":"https://doi.org/10.1145/2513190.2513197","url":null,"abstract":"Hardware technology has improved to a point where a solid state drive (SSD) can read faster than a traditional hard disk drive (HDD). This unique ability to retrieve data quickly combines perfectly with OLAP cube processing. In this paper, we study how to improve performance of OLAP cube processing on SSDs. The main novelty of our work is that we do not alter the internal subsystems of the DBMS. Instead, the DBMS treats the SSD as though it was a regular HDD. We propose optimizations for SQL queries to enhance their performance on SSDs. An experimental evaluation with the TPC-H database compares performance of our optimizations on SSDs and HDDs. We found that even though SSDs have slower write speeds than HDDs, their excellent read speed more than overcomes this limitation.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"157 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123531921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Clustering cubes with binary dimensions in one pass 一次聚类具有二进制维度的多维数据集
International Workshop on Data Warehousing and OLAP Pub Date : 2013-10-28 DOI: 10.1145/2513190.2513192
Carlos Garcia-Alvarado, C. Ordonez
{"title":"Clustering cubes with binary dimensions in one pass","authors":"Carlos Garcia-Alvarado, C. Ordonez","doi":"10.1145/2513190.2513192","DOIUrl":"https://doi.org/10.1145/2513190.2513192","url":null,"abstract":"Finding aggregations of records with high dimensionality in large data warehouses is a crucial and costly task. These groups of similar records are the result of partitions obtained with GROUP BYs. In this research, we focus on obtaining aggregations of groups of similar records by turning the problem into efficient binary clustering of a fact table as a relaxation of a GROUP BY clause. We present an efficient window-based Incremental K-Means algorithm in a relational database system implemented as a user-defined function. This variant is based on the Incremental K-Means algorithm. The speed up is achieved through the computation of sufficient statistics, multithreading, efficient distance computation and sparse matrix operations. Finally, the performance of our algorithm is compared against multiple variants of the K-Means algorithm. Our experiments show that our incremental K-Means algorithm achieves similar or even better results more quickly than the traditional K-Means algorithm.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132877430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Social microblogging cube 社交微博立方体
International Workshop on Data Warehousing and OLAP Pub Date : 2013-10-28 DOI: 10.1145/2513190.2513200
Lilia Hannachi, N. Benblidia, F. Bentayeb, Omar Boussaïd
{"title":"Social microblogging cube","authors":"Lilia Hannachi, N. Benblidia, F. Bentayeb, Omar Boussaïd","doi":"10.1145/2513190.2513200","DOIUrl":"https://doi.org/10.1145/2513190.2513200","url":null,"abstract":"Microblogging sites have become a staple in our modern world. They provide the users with the ability to keep in touch with their contacts, using up of 140 characters in the case of Twitter sites. Responding to this emerging trend, it becomes critically important to interactively view and analyze the massive amount of microblogging data from different perspectives and with multiple granularities. In the area of Business intelligence, On-line analytical processing (OLAP) is a powerful primitive for data analysis. However, OLAP tools face major challenges in manipulating unstructured text such as microblogging data.\u0000 In this paper, we suggest a new multidimensional model called \"Microblogging Cube\" to achieve OLAP techniques on unstructured microblogging data. It provides the possibility to analyze microblogs users and locations according to semantic, geographic and temporal axes. The semantic axe is defined by using the Open Directory Project (ODP) taxonomy. Different from existing classical multidimensional models, the measures in Microblogging Cube may vary depending on the aggregation levels. Further, in order to define the multiple granularities associated with microblogs users we propose a new process to extract the list of their communities.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131926872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
ProtOLAP: rapid OLAP prototyping with on-demand data supply ProtOLAP:快速OLAP原型与按需数据提供
International Workshop on Data Warehousing and OLAP Pub Date : 2013-10-28 DOI: 10.1145/2513190.2513199
S. Bimonte, Élodie Edoh-Alove, H. Nazih, Myoung-Ah Kang, S. Rizzi
{"title":"ProtOLAP: rapid OLAP prototyping with on-demand data supply","authors":"S. Bimonte, Élodie Edoh-Alove, H. Nazih, Myoung-Ah Kang, S. Rizzi","doi":"10.1145/2513190.2513199","DOIUrl":"https://doi.org/10.1145/2513190.2513199","url":null,"abstract":"The approaches to data warehouse design are based on the assumption that source data are known in advance and available. While this assumption is true in common project situations, in some peculiar contexts it is not. This is the case of the French national project for analysis of energetic agricultural farms, that is the case study of this paper. Here, the above-mentioned methods can hardly be applied because source data can only be identified and collected once user requirements indicate a need. Besides, the users involved in this project found it very hard to express their analysis needs in abstract terms, i.e., without visualizing sample results of queries, which in turn would require availability of source data. To solve this deadlock we propose ProtOLAP, a tool-assisted fast prototyping methodology that enables quick and reliable test and validation of data warehouse schemata in situations where data supply is collected on users' demand and users' ICT skills are minimal. To this end, users manually feed sample realistic data into a prototype created by designers, then they access and explore these sample data using pivot tables to validate the prototype.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115205682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
HMGraph OLAP: a novel framework for multi-dimensional heterogeneous network analysis HMGraph OLAP:一个多维异构网络分析的新框架
International Workshop on Data Warehousing and OLAP Pub Date : 2012-11-02 DOI: 10.1145/2390045.2390067
Mu Yin, Bin Wu, Zengfeng Zeng
{"title":"HMGraph OLAP: a novel framework for multi-dimensional heterogeneous network analysis","authors":"Mu Yin, Bin Wu, Zengfeng Zeng","doi":"10.1145/2390045.2390067","DOIUrl":"https://doi.org/10.1145/2390045.2390067","url":null,"abstract":"As information continues to grow at an explosive rate, more and more heterogeneous network data sources are coming into being. While OLAP (On-Line Analytical Processing) techniques have been proven effective for analyzing and mining structured data, unfortunately, to our best knowledge, there are no OLAP tools available that are able to analyze multi-dimensional heterogeneous networks from different perspectives and with multiple granularities. Therefore, we have developed a novel HMGraph OLAP (Heterogeneous and Multi-dimensional Graph OLAP) framework for the purpose of providing more dimensions and operations to mine multi-dimensional heterogeneous information network. After information dimensions and topological dimensions, we have been the first to propose entity dimensions, which represent an important dimension for heterogeneous network analysis. On the basis of this notion, we designed HMGraph OLAP operations named (Rotate and Stretch for entity dimensions, which are able to mine relationships between different entities. We then proposed the HMGraph Cube, which is an efficient data warehousing model for HMGraph OLAP. In addition, through comparison with common strategies, we have shown that the optimizations we have proposed deliver better performance. Finally, we have implemented a HMGraph OLAP prototype, LiterMiner, which has proven effective for the analysis of multi-dimensional heterogeneous networks.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125865207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信