基于异地存储集群的纠删码恢复性能改进研究

2016 12th International Conference on the Design of Reliable Communication Networks (DRCN) Pub Date : 2016-03-15 DOI:10.1109/DRCN.2016.7470846

Pablo Ignacio Serrano Caneleo, Lakshmi J. Mohan, P. Udaya, A. Harwood

{"title":"基于异地存储集群的纠删码恢复性能改进研究","authors":"Pablo Ignacio Serrano Caneleo, Lakshmi J. Mohan, P. Udaya, A. Harwood","doi":"10.1109/DRCN.2016.7470846","DOIUrl":null,"url":null,"abstract":"Erasure code based distributed storage systems are increasingly being used by storage providers for big data storage since they offer the same reliability as replication with significant decrease in the amount of storage required. But, when it comes to a storage system with data nodes spread across a very large geographical area, the code's recovery performance is affected by various factors, both network and computation related. In this paper, we expose an XOR based code supplemented with the ideas of parity duplication and rack awareness that could be adopted in such storage clusters to improve the recovery performance during node failures. We have implemented them on the erasure code module of the XORBAS version of the Hadoop Distributed File System (HDFS). For evaluating the performance of the proposed ideas, we employ a geo-diverse cluster on the NeCTAR research cloud. The experimental results show that the techniques aid in bringing down the data read for repair by a factor of 85% and repair duration by a factor of 57% during node failures, though resulting in an increased storage requirement of 21% as compared to the traditional Reed-Solomon codes used in HDFS. The sum of all these ideas could offer a better solution for a code based storage system spanning a wide geographical area that has storage constraints such that a triple replicated system is not affordable and at the same time has strict requirements on ensuring the minimal recovery time.","PeriodicalId":137650,"journal":{"name":"2016 12th International Conference on the Design of Reliable Communication Networks (DRCN)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"On improving recovery performance in erasure code based geo-diverse storage clusters\",\"authors\":\"Pablo Ignacio Serrano Caneleo, Lakshmi J. Mohan, P. Udaya, A. Harwood\",\"doi\":\"10.1109/DRCN.2016.7470846\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Erasure code based distributed storage systems are increasingly being used by storage providers for big data storage since they offer the same reliability as replication with significant decrease in the amount of storage required. But, when it comes to a storage system with data nodes spread across a very large geographical area, the code's recovery performance is affected by various factors, both network and computation related. In this paper, we expose an XOR based code supplemented with the ideas of parity duplication and rack awareness that could be adopted in such storage clusters to improve the recovery performance during node failures. We have implemented them on the erasure code module of the XORBAS version of the Hadoop Distributed File System (HDFS). For evaluating the performance of the proposed ideas, we employ a geo-diverse cluster on the NeCTAR research cloud. The experimental results show that the techniques aid in bringing down the data read for repair by a factor of 85% and repair duration by a factor of 57% during node failures, though resulting in an increased storage requirement of 21% as compared to the traditional Reed-Solomon codes used in HDFS. The sum of all these ideas could offer a better solution for a code based storage system spanning a wide geographical area that has storage constraints such that a triple replicated system is not affordable and at the same time has strict requirements on ensuring the minimal recovery time.\",\"PeriodicalId\":137650,\"journal\":{\"name\":\"2016 12th International Conference on the Design of Reliable Communication Networks (DRCN)\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 12th International Conference on the Design of Reliable Communication Networks (DRCN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DRCN.2016.7470846\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th International Conference on the Design of Reliable Communication Networks (DRCN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DRCN.2016.7470846","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

基于Erasure code的分布式存储系统越来越多地被存储提供商用于大数据存储，因为它们提供了与复制相同的可靠性，并且所需的存储量大大减少。但是，对于数据节点分布在非常大的地理区域的存储系统，代码的恢复性能受到各种因素的影响，包括网络和计算相关的因素。在本文中，我们公开了一个基于异或的代码，并补充了奇偶复制和机架感知的思想，可以在这样的存储集群中采用，以提高节点故障时的恢复性能。我们已经在HDFS XORBAS版本的erasure code模块上实现了它们。为了评估所提出的想法的性能，我们在NeCTAR研究云上采用了地理多样性集群。实验结果表明，在节点故障期间，该技术有助于将需要修复的数据读取量减少85%，修复时间减少57%，尽管与HDFS中使用的传统Reed-Solomon代码相比，其存储需求增加了21%。所有这些想法的总和可以为基于代码的存储系统提供一个更好的解决方案，该存储系统跨越广泛的地理区域，具有存储限制，例如三重复制系统无法负担得起，同时对确保最小恢复时间有严格的要求。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On improving recovery performance in erasure code based geo-diverse storage clusters

Erasure code based distributed storage systems are increasingly being used by storage providers for big data storage since they offer the same reliability as replication with significant decrease in the amount of storage required. But, when it comes to a storage system with data nodes spread across a very large geographical area, the code's recovery performance is affected by various factors, both network and computation related. In this paper, we expose an XOR based code supplemented with the ideas of parity duplication and rack awareness that could be adopted in such storage clusters to improve the recovery performance during node failures. We have implemented them on the erasure code module of the XORBAS version of the Hadoop Distributed File System (HDFS). For evaluating the performance of the proposed ideas, we employ a geo-diverse cluster on the NeCTAR research cloud. The experimental results show that the techniques aid in bringing down the data read for repair by a factor of 85% and repair duration by a factor of 57% during node failures, though resulting in an increased storage requirement of 21% as compared to the traditional Reed-Solomon codes used in HDFS. The sum of all these ideas could offer a better solution for a code based storage system spanning a wide geographical area that has storage constraints such that a triple replicated system is not affordable and at the same time has strict requirements on ensuring the minimal recovery time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 12th International Conference on the Design of Reliable Communication Networks (DRCN)

自引率

0.00%

发文量