From Backup to Hot Standby: High Availability for HDFS

Andrew Oriani, Islene C. Garcia
{"title":"From Backup to Hot Standby: High Availability for HDFS","authors":"Andrew Oriani, Islene C. Garcia","doi":"10.1109/SRDS.2012.33","DOIUrl":null,"url":null,"abstract":"Cluster-based distributed file systems generally have a single master to service clients and manage the namespace. Although simple and efficient, that design compromises availability, because the failure of the master takes the entire system down. Before version 2.0.0-alpha, the Hadoop Distributed File System (HDFS) -- an open-source storage, widely used by applications that operate over large datasets, such as MapReduce, and for which an uptime of 24x7 is becoming essential -- was an example of such systems. Given that scenario, this paper proposes a hot standby for the master of HDFS achieved by (i) extending the master's state replication performed by its check pointer helper, the Backup Node, and by (ii) introducing an automatic fail over mechanism. The step (i) took advantage of the message duplication technique developed by other high availability solution for HDFS named Avatar Nodes. The step (ii) employed another Hadoop software: ZooKeeper, a distributed coordination service. That approach resulted in small code changes, 1373 lines, not requiring external components to the Hadoop project. Thus, easing the maintenance and deployment of the file system. Compared to HDFS 0.21, tests showed that both in loads dominated by metadata operations or I/O operations, the reduction of data throughput is no more than 15% on average, and the time to switch the hot standby to active is less than 100 ms. Those results demonstrate the applicability of our solution to real systems. We also present related work on high availability for other file systems and HDFS, including the official solution, recently included in HDFS 2.0.0-alpha.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"50 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 31st Symposium on Reliable Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SRDS.2012.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 31

Abstract

Cluster-based distributed file systems generally have a single master to service clients and manage the namespace. Although simple and efficient, that design compromises availability, because the failure of the master takes the entire system down. Before version 2.0.0-alpha, the Hadoop Distributed File System (HDFS) -- an open-source storage, widely used by applications that operate over large datasets, such as MapReduce, and for which an uptime of 24x7 is becoming essential -- was an example of such systems. Given that scenario, this paper proposes a hot standby for the master of HDFS achieved by (i) extending the master's state replication performed by its check pointer helper, the Backup Node, and by (ii) introducing an automatic fail over mechanism. The step (i) took advantage of the message duplication technique developed by other high availability solution for HDFS named Avatar Nodes. The step (ii) employed another Hadoop software: ZooKeeper, a distributed coordination service. That approach resulted in small code changes, 1373 lines, not requiring external components to the Hadoop project. Thus, easing the maintenance and deployment of the file system. Compared to HDFS 0.21, tests showed that both in loads dominated by metadata operations or I/O operations, the reduction of data throughput is no more than 15% on average, and the time to switch the hot standby to active is less than 100 ms. Those results demonstrate the applicability of our solution to real systems. We also present related work on high availability for other file systems and HDFS, including the official solution, recently included in HDFS 2.0.0-alpha.
从备份到热备:HDFS的高可用性
基于集群的分布式文件系统通常有一个主服务器来服务客户机和管理名称空间。尽管这种设计简单而有效,但它会损害可用性,因为主服务器的故障会使整个系统瘫痪。在2.0.0-alpha版本之前,Hadoop分布式文件系统(HDFS)就是此类系统的一个例子。HDFS是一种开源存储,广泛用于运行大型数据集的应用程序,如MapReduce,并且24x7的正常运行时间变得至关重要。在这种情况下,本文提出了HDFS主节点的热备,通过以下方式实现:(i)扩展主节点的状态复制,由其检查指针助手Backup Node执行,以及(ii)引入自动故障转移机制。步骤(i)利用了其他高可用性解决方案开发的消息复制技术,名为Avatar Nodes。步骤(ii)使用了另一个Hadoop软件:ZooKeeper,一个分布式协调服务。这种方法只对代码进行了很小的修改,只有1373行,不需要Hadoop项目的外部组件。从而简化了文件系统的维护和部署。与HDFS 0.21相比,测试表明,无论是元数据操作为主的负载还是I/O操作为主的负载,数据吞吐量的平均下降幅度都不超过15%,双机热备切换到主用的时间都在100ms以内。这些结果证明了我们的解决方案在实际系统中的适用性。我们还介绍了其他文件系统和HDFS的高可用性相关工作,包括最近包含在HDFS 2.0.0-alpha中的官方解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信