{"title":"通过改进多节点集群上的数据放置来优化OLAP多维数据集构造","authors":"Billel Arres, N. Kabachi, Omar Boussaïd","doi":"10.1109/PDP.2015.45","DOIUrl":null,"url":null,"abstract":"The increasing volumes of relational data let us find an alternative to cope with them. The Hadoop framework - which is an open source project based on the MapReduce paradigm - is a popular choice for big data analytics. However, the performance gained from Hadoop's features is currently limited by its default block placement policy, which does not take any data characteristics into account. Indeed, the efficiency of many operations can be improved by a careful data placement, including indexing, grouping, aggregation and joins. In this paper we propose a data warehouse placement policy to improve query gain performances on multi nodes clusters, especially Hadoop clusters. We investigate the performance gain for OLAP cube construction query with and without data organization. And this, by varying the number of nodes and data warehouse size. It has been found that, the proposed data placement policy has lowered global execution time for building OLAP data cubes up to 20 percent compared to default data placement.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Optimizing OLAP Cubes Construction by Improving Data Placement on Multi-nodes Clusters\",\"authors\":\"Billel Arres, N. Kabachi, Omar Boussaïd\",\"doi\":\"10.1109/PDP.2015.45\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The increasing volumes of relational data let us find an alternative to cope with them. The Hadoop framework - which is an open source project based on the MapReduce paradigm - is a popular choice for big data analytics. However, the performance gained from Hadoop's features is currently limited by its default block placement policy, which does not take any data characteristics into account. Indeed, the efficiency of many operations can be improved by a careful data placement, including indexing, grouping, aggregation and joins. In this paper we propose a data warehouse placement policy to improve query gain performances on multi nodes clusters, especially Hadoop clusters. We investigate the performance gain for OLAP cube construction query with and without data organization. And this, by varying the number of nodes and data warehouse size. It has been found that, the proposed data placement policy has lowered global execution time for building OLAP data cubes up to 20 percent compared to default data placement.\",\"PeriodicalId\":285111,\"journal\":{\"name\":\"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-03-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDP.2015.45\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP.2015.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Optimizing OLAP Cubes Construction by Improving Data Placement on Multi-nodes Clusters
The increasing volumes of relational data let us find an alternative to cope with them. The Hadoop framework - which is an open source project based on the MapReduce paradigm - is a popular choice for big data analytics. However, the performance gained from Hadoop's features is currently limited by its default block placement policy, which does not take any data characteristics into account. Indeed, the efficiency of many operations can be improved by a careful data placement, including indexing, grouping, aggregation and joins. In this paper we propose a data warehouse placement policy to improve query gain performances on multi nodes clusters, especially Hadoop clusters. We investigate the performance gain for OLAP cube construction query with and without data organization. And this, by varying the number of nodes and data warehouse size. It has been found that, the proposed data placement policy has lowered global execution time for building OLAP data cubes up to 20 percent compared to default data placement.