HView: Multi-dimension view of massive data in Hadoop

Fuhui Wu, Q. Wu, Yusong Tan
{"title":"HView: Multi-dimension view of massive data in Hadoop","authors":"Fuhui Wu, Q. Wu, Yusong Tan","doi":"10.1109/ICCSNT.2013.6967146","DOIUrl":null,"url":null,"abstract":"Hadoop has become an attractive platform to store large-scale data in HDFS and perform analytics using MapReduce framework. However, dataset of multi-field in HDFS is usually stored in just one-dimension. Analytics in Hadoop usually need to process the whole dataset in a brute way. In this paper, we introduce HView, an extension of data layout in HDFS, to store data according to multiple fields. HView provides people with different dimension views of the same dataset in HDFS. HView does not need to modify Hadoop, increase DataNode storage occupy or bring Namenode pressure. We exploit a use case of Map-side join for HView. Experiment result shows that HView can improve the efficiency of Map-side join and solve the problem of size limit in Map-side join.","PeriodicalId":163318,"journal":{"name":"Proceedings of 2013 3rd International Conference on Computer Science and Network Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 2013 3rd International Conference on Computer Science and Network Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSNT.2013.6967146","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Hadoop has become an attractive platform to store large-scale data in HDFS and perform analytics using MapReduce framework. However, dataset of multi-field in HDFS is usually stored in just one-dimension. Analytics in Hadoop usually need to process the whole dataset in a brute way. In this paper, we introduce HView, an extension of data layout in HDFS, to store data according to multiple fields. HView provides people with different dimension views of the same dataset in HDFS. HView does not need to modify Hadoop, increase DataNode storage occupy or bring Namenode pressure. We exploit a use case of Map-side join for HView. Experiment result shows that HView can improve the efficiency of Map-side join and solve the problem of size limit in Map-side join.
HView: Hadoop海量数据的多维视图
Hadoop已经成为在HDFS中存储大规模数据并使用MapReduce框架执行分析的一个有吸引力的平台。然而,HDFS中多字段的数据集通常是一维存储的。Hadoop中的分析通常需要以野蛮的方式处理整个数据集。在本文中,我们介绍了HView,它是HDFS中数据布局的扩展,可以根据多个字段存储数据。HView为人们提供HDFS中相同数据集的不同维度视图。HView不需要修改Hadoop,不需要增加DataNode的存储占用,不需要给Namenode带来压力。我们利用了HView的map端连接用例。实验结果表明,HView可以提高Map-side join的效率,解决Map-side join的大小限制问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信