Optimizing OLAP Cubes Construction by Improving Data Placement on Multi-nodes Clusters

2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2015-03-04 DOI:10.1109/PDP.2015.45

Billel Arres, N. Kabachi, Omar Boussaïd

{"title":"Optimizing OLAP Cubes Construction by Improving Data Placement on Multi-nodes Clusters","authors":"Billel Arres, N. Kabachi, Omar Boussaïd","doi":"10.1109/PDP.2015.45","DOIUrl":null,"url":null,"abstract":"The increasing volumes of relational data let us find an alternative to cope with them. The Hadoop framework - which is an open source project based on the MapReduce paradigm - is a popular choice for big data analytics. However, the performance gained from Hadoop's features is currently limited by its default block placement policy, which does not take any data characteristics into account. Indeed, the efficiency of many operations can be improved by a careful data placement, including indexing, grouping, aggregation and joins. In this paper we propose a data warehouse placement policy to improve query gain performances on multi nodes clusters, especially Hadoop clusters. We investigate the performance gain for OLAP cube construction query with and without data organization. And this, by varying the number of nodes and data warehouse size. It has been found that, the proposed data placement policy has lowered global execution time for building OLAP data cubes up to 20 percent compared to default data placement.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP.2015.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

The increasing volumes of relational data let us find an alternative to cope with them. The Hadoop framework - which is an open source project based on the MapReduce paradigm - is a popular choice for big data analytics. However, the performance gained from Hadoop's features is currently limited by its default block placement policy, which does not take any data characteristics into account. Indeed, the efficiency of many operations can be improved by a careful data placement, including indexing, grouping, aggregation and joins. In this paper we propose a data warehouse placement policy to improve query gain performances on multi nodes clusters, especially Hadoop clusters. We investigate the performance gain for OLAP cube construction query with and without data organization. And this, by varying the number of nodes and data warehouse size. It has been found that, the proposed data placement policy has lowered global execution time for building OLAP data cubes up to 20 percent compared to default data placement.

查看原文本刊更多论文

通过改进多节点集群上的数据放置来优化OLAP多维数据集构造

不断增加的关系数据量让我们找到了另一种方法来处理它们。Hadoop框架——一个基于MapReduce范式的开源项目——是大数据分析的热门选择。然而，从Hadoop特性中获得的性能目前受到其默认块放置策略的限制，该策略不考虑任何数据特征。实际上，许多操作的效率可以通过仔细的数据放置来提高，包括索引、分组、聚合和连接。在本文中，我们提出了一种数据仓库放置策略来提高多节点集群，特别是Hadoop集群的查询增益性能。我们研究了使用和不使用数据组织时OLAP多维数据集构造查询的性能增益。这是通过改变节点数量和数据仓库大小来实现的。已经发现，与默认数据放置相比，建议的数据放置策略将构建OLAP数据立方体的全局执行时间降低了20%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing

自引率

0.00%

发文量