Efficient Star Join for Column-oriented Data Store in the MapReduce Environment

Haitong Zhu, Minqi Zhou, Fan Xia, Aoying Zhou
{"title":"Efficient Star Join for Column-oriented Data Store in the MapReduce Environment","authors":"Haitong Zhu, Minqi Zhou, Fan Xia, Aoying Zhou","doi":"10.1109/WISA.2011.10","DOIUrl":null,"url":null,"abstract":"Map Reduce is a parallel computing paradigm that has gained a lot of attention from both industry and academia recent years. Unlike parallel DBMSs, with Map Reduce, it is easier for non-expert to develop scalable parallel programs for analytical applications over huge data sets across clusters of commodity machines. As the nature of scan-oriented processing, the performance of Map Reduce for relation operators can be enhanced dramatically since it is inevitably accessing lots of unnecessary data tuples, especially for table join operators. In this paper, we propose an efficient star join strategy called HdBmp join for column-oriented data store by using a three-level content aware index (i.e., HdBmp Index). Armed with this index, most of the unnecessary tuples in the join processing can be filtered out, and consequently result in immense reduction in both communication cost and execution time. Our extensive experimental studies confirm the efficiency, scalability and effectiveness of our new proposed join methods.","PeriodicalId":242633,"journal":{"name":"2011 Eighth Web Information Systems and Applications Conference","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Eighth Web Information Systems and Applications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WISA.2011.10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Map Reduce is a parallel computing paradigm that has gained a lot of attention from both industry and academia recent years. Unlike parallel DBMSs, with Map Reduce, it is easier for non-expert to develop scalable parallel programs for analytical applications over huge data sets across clusters of commodity machines. As the nature of scan-oriented processing, the performance of Map Reduce for relation operators can be enhanced dramatically since it is inevitably accessing lots of unnecessary data tuples, especially for table join operators. In this paper, we propose an efficient star join strategy called HdBmp join for column-oriented data store by using a three-level content aware index (i.e., HdBmp Index). Armed with this index, most of the unnecessary tuples in the join processing can be filtered out, and consequently result in immense reduction in both communication cost and execution time. Our extensive experimental studies confirm the efficiency, scalability and effectiveness of our new proposed join methods.
MapReduce环境下面向列数据存储的高效星型联接
Map Reduce是一种并行计算范式,近年来受到了工业界和学术界的广泛关注。与并行dbms不同的是,对于非专业人员来说,使用Map Reduce可以更容易地开发可扩展的并行程序,用于跨商业机器集群的大型数据集的分析应用程序。作为面向扫描处理的本质,Map Reduce对于关系操作符的性能可以得到显著增强,因为它不可避免地要访问大量不必要的数据元组,特别是对于表连接操作符。本文通过使用三级内容感知索引(即HdBmp index),为面向列的数据存储提出了一种高效的星型连接策略HdBmp连接。有了这个索引,连接处理中大多数不必要的元组都可以被过滤掉,从而大大减少了通信成本和执行时间。我们广泛的实验研究证实了我们提出的新连接方法的效率、可扩展性和有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信