MAMS: A Highly Reliable Policy for Metadata Service

2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI:10.1109/ICPP.2015.82

Jiang Zhou, Yong Chen, Weiping Wang, Dan Meng

{"title":"MAMS: A Highly Reliable Policy for Metadata Service","authors":"Jiang Zhou, Yong Chen, Weiping Wang, Dan Meng","doi":"10.1109/ICPP.2015.82","DOIUrl":null,"url":null,"abstract":"Most mass data processing applications nowadays often need long, continuous, and uninterrupted data access. Parallel/distributed file systems often use multiple metadata servers to manage the global namespace and provide a reliability guarantee. With the rapid increase of data amount and system scale, the probability of hardware or software failures keeps increasing, which easily leads to multiple points of failures. Metadata service reliability has become a crucial issue as it affects file and directory operations in the event of failures. Existing reliable metadata management mechanisms can provide fault tolerance but have disadvantages in system availability, state consistence, and performance overhead. This paper introduces a new highly reliable policy called MAMS (multiple actives multiple standbys) to ensure multiple metadata service reliability in file systems. Different from traditional strategies, the MAMS divides metadata servers into different replica groups and maintains more than one standby node for failover in each group. Combining the global view with distributed protocols, the MAMS achieves an automatic state transition and service takeover. We have implemented the MAMS policy in a prototyping file system and conducted extensive tests to validate and evaluate it. The experimental results confirm that the MAMS policy can achieve a faster transparent fault tolerance in different error scenarios with less influence on metadata operations. Compared with typical designs in Hadoop Avatar, Hadoop HA, and Boom-FS file systems, the mean time to recovery (MTTR) with the MAMS was reduced by 80.23%, 65.46% and 28.13%, respectively.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"170 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 44th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2015.82","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Most mass data processing applications nowadays often need long, continuous, and uninterrupted data access. Parallel/distributed file systems often use multiple metadata servers to manage the global namespace and provide a reliability guarantee. With the rapid increase of data amount and system scale, the probability of hardware or software failures keeps increasing, which easily leads to multiple points of failures. Metadata service reliability has become a crucial issue as it affects file and directory operations in the event of failures. Existing reliable metadata management mechanisms can provide fault tolerance but have disadvantages in system availability, state consistence, and performance overhead. This paper introduces a new highly reliable policy called MAMS (multiple actives multiple standbys) to ensure multiple metadata service reliability in file systems. Different from traditional strategies, the MAMS divides metadata servers into different replica groups and maintains more than one standby node for failover in each group. Combining the global view with distributed protocols, the MAMS achieves an automatic state transition and service takeover. We have implemented the MAMS policy in a prototyping file system and conducted extensive tests to validate and evaluate it. The experimental results confirm that the MAMS policy can achieve a faster transparent fault tolerance in different error scenarios with less influence on metadata operations. Compared with typical designs in Hadoop Avatar, Hadoop HA, and Boom-FS file systems, the mean time to recovery (MTTR) with the MAMS was reduced by 80.23%, 65.46% and 28.13%, respectively.

查看原文本刊更多论文

MAMS:元数据服务的高可靠策略

如今，大多数大规模数据处理应用程序通常需要长时间、连续和不间断的数据访问。并行/分布式文件系统通常使用多个元数据服务器来管理全局命名空间，并提供可靠性保证。随着数据量和系统规模的快速增长，硬件或软件故障的概率不断增加，容易导致多点故障。元数据服务的可靠性已经成为一个关键问题，因为它会影响到文件和目录的操作。现有可靠的元数据管理机制可以提供容错，但在系统可用性、状态一致性和性能开销方面存在缺点。为了保证文件系统中多元数据服务的可靠性，本文引入了一种新的高可靠性策略MAMS (multiple active multiple standby)。与传统策略不同，MAMS将元数据服务器划分为不同的副本组，并在每个组中维护多个备用节点用于故障转移。MAMS将全局视图与分布式协议相结合，实现了自动状态转换和服务接管。我们已经在原型文件系统中实现了MAMS策略，并进行了广泛的测试来验证和评估它。实验结果表明，MAMS策略可以在不同的错误场景下实现更快的透明容错，对元数据操作的影响较小。与Hadoop Avatar、Hadoop HA和Boom-FS文件系统的典型设计相比，MAMS的平均恢复时间(MTTR)分别降低了80.23%、65.46%和28.13%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 44th International Conference on Parallel Processing

自引率

0.00%

发文量