Efficient Algorithms for On-line Analysis Processing On Compressed Data Warehouses

2007 IFIP International Conference on Network and Parallel Computing Workshops (NPC 2007) Pub Date : 2007-09-18 DOI:10.1109/NPC.2007.182

Jianzhong Li, Hong Gao

{"title":"Efficient Algorithms for On-line Analysis Processing On Compressed Data Warehouses","authors":"Jianzhong Li, Hong Gao","doi":"10.1109/NPC.2007.182","DOIUrl":null,"url":null,"abstract":"Data compression is an effective technique to improve the performance of data warehouses. Aggregation and cube are important operations for on-line analytical processing (OLAP). It is a major challenge to develop efficient algorithms for aggregation and cube operations on compressed data warehouses. Many efficient algorithms to compute aggregation and cube for relational OLAP have been developed. Some work has been done on efficiently computing aggregation and cube for multidimensional data warehouses (MDWs) that store datasets in multidimensional arrays rather than in tables. However, to our knowledge, there is few to date in the literature describing aggregation algorithms on compressed data warehouses for multidimensional OLAP. The goal of this paper is to develop efficient algorithms to compute aggregation and cube on compressed MDWs,. For aggregation operations, four algorithms are proposed in this paper. These algorithms operate directly on compressed datasets, which are compressed by the mapping-complete compression methods, without the need to first decompress them. The algorithms have different performance behaviors as a function of the dataset parameters, sizes of outputs and main memory availability. The algorithms are described and the I/O and CPU cost functions are presented in this paper. A decision procedure to select the most efficient algorithm for a given aggregation request is also proposed. The analysis and experimental results show that the algorithms have better performance on sparse data than the previous aggregation algorithms. For cube operations, this paper presents a novel algorithm to compute cubes on compressed data warehouses. The proposed algorithm also operates directly on compressed datasets without the need of first decompressing them. The algorithm is applicable to a large class of mapping complete data compression methods. The complexity of the algorithm is analyzed in detail. The analytical and experimental results show that the algorithm is more efficient than all other existing cube algorithms. In addition, a heuristic algorithm to generate an optimal plan for computing cube on data warehouses is also proposed in the paper. In conclusion, direct manipulation of compressed data is an important tool for managing very large data warehouses. Aggregation and cube are just two (and important) such operation in this direction. Additional algorithms will be needed for OLAP on compressed multidimensional data OLAP on compressed multidimensional data warehouses. We are currently working on algorithms for other operations on compressed MDWs,. We are also working on algorithms for OLAP operations applicable to other kinds of compression methods other than mapping-complete compression methods.","PeriodicalId":278518,"journal":{"name":"2007 IFIP International Conference on Network and Parallel Computing Workshops (NPC 2007)","volume":"110 9-10","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IFIP International Conference on Network and Parallel Computing Workshops (NPC 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NPC.2007.182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Data compression is an effective technique to improve the performance of data warehouses. Aggregation and cube are important operations for on-line analytical processing (OLAP). It is a major challenge to develop efficient algorithms for aggregation and cube operations on compressed data warehouses. Many efficient algorithms to compute aggregation and cube for relational OLAP have been developed. Some work has been done on efficiently computing aggregation and cube for multidimensional data warehouses (MDWs) that store datasets in multidimensional arrays rather than in tables. However, to our knowledge, there is few to date in the literature describing aggregation algorithms on compressed data warehouses for multidimensional OLAP. The goal of this paper is to develop efficient algorithms to compute aggregation and cube on compressed MDWs,. For aggregation operations, four algorithms are proposed in this paper. These algorithms operate directly on compressed datasets, which are compressed by the mapping-complete compression methods, without the need to first decompress them. The algorithms have different performance behaviors as a function of the dataset parameters, sizes of outputs and main memory availability. The algorithms are described and the I/O and CPU cost functions are presented in this paper. A decision procedure to select the most efficient algorithm for a given aggregation request is also proposed. The analysis and experimental results show that the algorithms have better performance on sparse data than the previous aggregation algorithms. For cube operations, this paper presents a novel algorithm to compute cubes on compressed data warehouses. The proposed algorithm also operates directly on compressed datasets without the need of first decompressing them. The algorithm is applicable to a large class of mapping complete data compression methods. The complexity of the algorithm is analyzed in detail. The analytical and experimental results show that the algorithm is more efficient than all other existing cube algorithms. In addition, a heuristic algorithm to generate an optimal plan for computing cube on data warehouses is also proposed in the paper. In conclusion, direct manipulation of compressed data is an important tool for managing very large data warehouses. Aggregation and cube are just two (and important) such operation in this direction. Additional algorithms will be needed for OLAP on compressed multidimensional data OLAP on compressed multidimensional data warehouses. We are currently working on algorithms for other operations on compressed MDWs,. We are also working on algorithms for OLAP operations applicable to other kinds of compression methods other than mapping-complete compression methods.

查看原文本刊更多论文

压缩数据仓库在线分析处理的高效算法

数据压缩是提高数据仓库性能的一种有效技术。聚合和立方体是联机分析处理(OLAP)的重要操作。为压缩数据仓库上的聚合和多维数据集操作开发有效的算法是一个主要挑战。对于关系型OLAP，已经开发了许多高效的聚合和多维数据集计算算法。在为多维数据仓库(mdw)高效计算聚合和多维数据集方面已经做了一些工作，mdw将数据集存储在多维数组而不是表中。然而，据我们所知，迄今为止在文献中很少有描述多维OLAP的压缩数据仓库的聚合算法。本文的目标是开发有效的算法来计算压缩mdw上的聚合和立方体。对于聚合操作，本文提出了四种算法。这些算法直接对压缩数据集进行操作，这些数据集通过映射完全压缩方法进行压缩，而不需要先对它们进行解压缩。这些算法具有不同的性能行为，作为数据集参数、输出大小和主内存可用性的函数。本文描述了这些算法，并给出了I/O和CPU开销函数。针对给定的聚合请求，给出了选择最有效算法的决策过程。分析和实验结果表明，该算法在稀疏数据上比以往的聚合算法有更好的性能。对于立方体操作，本文提出了一种在压缩数据仓库上计算立方体的新算法。该算法还可以直接对压缩数据集进行操作，而不需要先对其进行解压缩。该算法适用于大量的映射完备数据压缩方法。详细分析了该算法的复杂度。分析和实验结果表明，该算法比现有的所有立方体算法都要高效。此外，本文还提出了一种启发式算法，用于生成数据仓库上计算立方体的最优方案。总之，直接操作压缩数据是管理超大型数据仓库的重要工具。聚合和多维数据集只是这一方向上的两种(而且很重要)操作。压缩多维数据仓库上的OLAP需要额外的算法。我们目前正在研究压缩mdw上其他操作的算法。我们还在研究适用于除映射完全压缩方法以外的其他类型压缩方法的OLAP操作算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2007 IFIP International Conference on Network and Parallel Computing Workshops (NPC 2007)

自引率

0.00%

发文量