{"title":"NM2H: Design and Implementation of NoSQL Extension for HDFS Metadata Management","authors":"Ruini Xue, Z. Guan, Shengli Gao, Lixiang Ao","doi":"10.1109/CSE.2014.246","DOIUrl":null,"url":null,"abstract":"As a distributed MapReduce framework, Hadoop has been widely adopted in big data processing, in which HDFS (Hadoop Distributed File System) is mostly used for data storage. Though the single master architecture of HDFS simplifies the design and implementation, it suffers from issues such as SPOF (Single Point Of Failure) and scalability, which further may become performance bottleneck. To address these problems, this paper proposes NM2H, a NoSQL based metadata management approach for HDFS. NM2H separates the storage and query of metadata in contrast to the traditional architecture which mixed them up, and manages to keep the interfaces among the metadata service, Data Nodes and clients unchanged through a novel mapping mechanism between the original metadata structures to NoSQL documents. Therefore, the new approach can not only take advantages of NoSQL's better scalability and fault tolerance, but also deliver transparency to client applications, in which way existing programs can run on the new architecture without any modification. The prototype of NM2H was designed and implemented with widely adopted NoSQL system MongoDB. Extensive performance evaluation was conducted and the experimental results indicated the improvement of NM2H, while the overhead introduced was acceptable.","PeriodicalId":258990,"journal":{"name":"2014 IEEE 17th International Conference on Computational Science and Engineering","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 17th International Conference on Computational Science and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSE.2014.246","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
As a distributed MapReduce framework, Hadoop has been widely adopted in big data processing, in which HDFS (Hadoop Distributed File System) is mostly used for data storage. Though the single master architecture of HDFS simplifies the design and implementation, it suffers from issues such as SPOF (Single Point Of Failure) and scalability, which further may become performance bottleneck. To address these problems, this paper proposes NM2H, a NoSQL based metadata management approach for HDFS. NM2H separates the storage and query of metadata in contrast to the traditional architecture which mixed them up, and manages to keep the interfaces among the metadata service, Data Nodes and clients unchanged through a novel mapping mechanism between the original metadata structures to NoSQL documents. Therefore, the new approach can not only take advantages of NoSQL's better scalability and fault tolerance, but also deliver transparency to client applications, in which way existing programs can run on the new architecture without any modification. The prototype of NM2H was designed and implemented with widely adopted NoSQL system MongoDB. Extensive performance evaluation was conducted and the experimental results indicated the improvement of NM2H, while the overhead introduced was acceptable.