NM2H: HDFS元数据管理NoSQL扩展的设计与实现

Ruini Xue, Z. Guan, Shengli Gao, Lixiang Ao
{"title":"NM2H: HDFS元数据管理NoSQL扩展的设计与实现","authors":"Ruini Xue, Z. Guan, Shengli Gao, Lixiang Ao","doi":"10.1109/CSE.2014.246","DOIUrl":null,"url":null,"abstract":"As a distributed MapReduce framework, Hadoop has been widely adopted in big data processing, in which HDFS (Hadoop Distributed File System) is mostly used for data storage. Though the single master architecture of HDFS simplifies the design and implementation, it suffers from issues such as SPOF (Single Point Of Failure) and scalability, which further may become performance bottleneck. To address these problems, this paper proposes NM2H, a NoSQL based metadata management approach for HDFS. NM2H separates the storage and query of metadata in contrast to the traditional architecture which mixed them up, and manages to keep the interfaces among the metadata service, Data Nodes and clients unchanged through a novel mapping mechanism between the original metadata structures to NoSQL documents. Therefore, the new approach can not only take advantages of NoSQL's better scalability and fault tolerance, but also deliver transparency to client applications, in which way existing programs can run on the new architecture without any modification. The prototype of NM2H was designed and implemented with widely adopted NoSQL system MongoDB. Extensive performance evaluation was conducted and the experimental results indicated the improvement of NM2H, while the overhead introduced was acceptable.","PeriodicalId":258990,"journal":{"name":"2014 IEEE 17th International Conference on Computational Science and Engineering","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"NM2H: Design and Implementation of NoSQL Extension for HDFS Metadata Management\",\"authors\":\"Ruini Xue, Z. Guan, Shengli Gao, Lixiang Ao\",\"doi\":\"10.1109/CSE.2014.246\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As a distributed MapReduce framework, Hadoop has been widely adopted in big data processing, in which HDFS (Hadoop Distributed File System) is mostly used for data storage. Though the single master architecture of HDFS simplifies the design and implementation, it suffers from issues such as SPOF (Single Point Of Failure) and scalability, which further may become performance bottleneck. To address these problems, this paper proposes NM2H, a NoSQL based metadata management approach for HDFS. NM2H separates the storage and query of metadata in contrast to the traditional architecture which mixed them up, and manages to keep the interfaces among the metadata service, Data Nodes and clients unchanged through a novel mapping mechanism between the original metadata structures to NoSQL documents. Therefore, the new approach can not only take advantages of NoSQL's better scalability and fault tolerance, but also deliver transparency to client applications, in which way existing programs can run on the new architecture without any modification. The prototype of NM2H was designed and implemented with widely adopted NoSQL system MongoDB. Extensive performance evaluation was conducted and the experimental results indicated the improvement of NM2H, while the overhead introduced was acceptable.\",\"PeriodicalId\":258990,\"journal\":{\"name\":\"2014 IEEE 17th International Conference on Computational Science and Engineering\",\"volume\":\"100 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE 17th International Conference on Computational Science and Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSE.2014.246\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 17th International Conference on Computational Science and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSE.2014.246","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

Hadoop作为一种分布式MapReduce框架,在大数据处理中得到了广泛的应用,其中数据存储主要使用HDFS (Hadoop distributed File System)。HDFS的单主架构虽然简化了设计和实现,但也存在单点故障(SPOF)和可扩展性等问题,这可能进一步成为性能瓶颈。为了解决这些问题,本文提出了NM2H,一种基于NoSQL的HDFS元数据管理方法。NM2H将元数据的存储和查询分开,而不是将它们混合在一起,并通过一种新颖的元数据结构到NoSQL文档之间的映射机制来保持元数据服务、数据节点和客户端之间的接口不变。因此,新方法不仅可以利用NoSQL更好的可伸缩性和容错性,还可以为客户端应用程序提供透明性,从而使现有程序无需任何修改即可在新体系结构上运行。NM2H的原型是在广泛采用的NoSQL系统MongoDB上设计和实现的。进行了广泛的性能评估,实验结果表明NM2H得到了改进,而引入的开销是可以接受的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
NM2H: Design and Implementation of NoSQL Extension for HDFS Metadata Management
As a distributed MapReduce framework, Hadoop has been widely adopted in big data processing, in which HDFS (Hadoop Distributed File System) is mostly used for data storage. Though the single master architecture of HDFS simplifies the design and implementation, it suffers from issues such as SPOF (Single Point Of Failure) and scalability, which further may become performance bottleneck. To address these problems, this paper proposes NM2H, a NoSQL based metadata management approach for HDFS. NM2H separates the storage and query of metadata in contrast to the traditional architecture which mixed them up, and manages to keep the interfaces among the metadata service, Data Nodes and clients unchanged through a novel mapping mechanism between the original metadata structures to NoSQL documents. Therefore, the new approach can not only take advantages of NoSQL's better scalability and fault tolerance, but also deliver transparency to client applications, in which way existing programs can run on the new architecture without any modification. The prototype of NM2H was designed and implemented with widely adopted NoSQL system MongoDB. Extensive performance evaluation was conducted and the experimental results indicated the improvement of NM2H, while the overhead introduced was acceptable.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信