{"title":"From Backup to Hot Standby: High Availability for HDFS","authors":"Andrew Oriani, Islene C. Garcia","doi":"10.1109/SRDS.2012.33","DOIUrl":null,"url":null,"abstract":"Cluster-based distributed file systems generally have a single master to service clients and manage the namespace. Although simple and efficient, that design compromises availability, because the failure of the master takes the entire system down. Before version 2.0.0-alpha, the Hadoop Distributed File System (HDFS) -- an open-source storage, widely used by applications that operate over large datasets, such as MapReduce, and for which an uptime of 24x7 is becoming essential -- was an example of such systems. Given that scenario, this paper proposes a hot standby for the master of HDFS achieved by (i) extending the master's state replication performed by its check pointer helper, the Backup Node, and by (ii) introducing an automatic fail over mechanism. The step (i) took advantage of the message duplication technique developed by other high availability solution for HDFS named Avatar Nodes. The step (ii) employed another Hadoop software: ZooKeeper, a distributed coordination service. That approach resulted in small code changes, 1373 lines, not requiring external components to the Hadoop project. Thus, easing the maintenance and deployment of the file system. Compared to HDFS 0.21, tests showed that both in loads dominated by metadata operations or I/O operations, the reduction of data throughput is no more than 15% on average, and the time to switch the hot standby to active is less than 100 ms. Those results demonstrate the applicability of our solution to real systems. We also present related work on high availability for other file systems and HDFS, including the official solution, recently included in HDFS 2.0.0-alpha.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"50 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 31st Symposium on Reliable Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SRDS.2012.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 31
Abstract
Cluster-based distributed file systems generally have a single master to service clients and manage the namespace. Although simple and efficient, that design compromises availability, because the failure of the master takes the entire system down. Before version 2.0.0-alpha, the Hadoop Distributed File System (HDFS) -- an open-source storage, widely used by applications that operate over large datasets, such as MapReduce, and for which an uptime of 24x7 is becoming essential -- was an example of such systems. Given that scenario, this paper proposes a hot standby for the master of HDFS achieved by (i) extending the master's state replication performed by its check pointer helper, the Backup Node, and by (ii) introducing an automatic fail over mechanism. The step (i) took advantage of the message duplication technique developed by other high availability solution for HDFS named Avatar Nodes. The step (ii) employed another Hadoop software: ZooKeeper, a distributed coordination service. That approach resulted in small code changes, 1373 lines, not requiring external components to the Hadoop project. Thus, easing the maintenance and deployment of the file system. Compared to HDFS 0.21, tests showed that both in loads dominated by metadata operations or I/O operations, the reduction of data throughput is no more than 15% on average, and the time to switch the hot standby to active is less than 100 ms. Those results demonstrate the applicability of our solution to real systems. We also present related work on high availability for other file systems and HDFS, including the official solution, recently included in HDFS 2.0.0-alpha.