Practical Design Considerations for Wide Locally Recoverable Codes (LRCs)

IF 2.1 3区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Saurabh Kadekodi, Shashwat Silas, David Clausen, Arif Merchant
{"title":"Practical Design Considerations for Wide Locally Recoverable Codes (LRCs)","authors":"Saurabh Kadekodi, Shashwat Silas, David Clausen, Arif Merchant","doi":"10.1145/3626198","DOIUrl":null,"url":null,"abstract":"<p>Most of the data in large-scale storage clusters is erasure coded. At exascale, optimizing erasure codes for low storage overhead, efficient reconstruction, and easy deployment is of critical importance. <i>Locally recoverable codes (LRCs)</i> have deservedly gained central importance in this field, because they can balance many of these requirements. In our work, we study wide LRCs; LRCs with large number of blocks per stripe and low storage overhead. These codes are a natural next step for practitioners to unlock higher storage savings, but they come with their own challenges. Of particular interest is their <i>reliability</i>, since wider stripes are prone to more simultaneous failures.</p><p>We conduct a practically minded analysis of several popular and novel LRCs. We find that wide LRC reliability is a subtle phenomenon that is sensitive to several design choices, some of which are overlooked by theoreticians, and others by practitioners. Based on these insights, we construct novel LRCs called <i>Uniform Cauchy LRCs</i>, which show excellent performance in simulations and a 33% improvement in reliability on unavailability events observed by a wide LRC deployed in a Google storage cluster. We also show that these codes are easy to deploy in a manner that improves their robustness to common maintenance events. Along the way, we also give a remarkably simple and novel construction of distance-optimal LRCs (other constructions are also known), which may be of interest to theory-minded readers.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"71 7","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Storage","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3626198","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 1

Abstract

Most of the data in large-scale storage clusters is erasure coded. At exascale, optimizing erasure codes for low storage overhead, efficient reconstruction, and easy deployment is of critical importance. Locally recoverable codes (LRCs) have deservedly gained central importance in this field, because they can balance many of these requirements. In our work, we study wide LRCs; LRCs with large number of blocks per stripe and low storage overhead. These codes are a natural next step for practitioners to unlock higher storage savings, but they come with their own challenges. Of particular interest is their reliability, since wider stripes are prone to more simultaneous failures.

We conduct a practically minded analysis of several popular and novel LRCs. We find that wide LRC reliability is a subtle phenomenon that is sensitive to several design choices, some of which are overlooked by theoreticians, and others by practitioners. Based on these insights, we construct novel LRCs called Uniform Cauchy LRCs, which show excellent performance in simulations and a 33% improvement in reliability on unavailability events observed by a wide LRC deployed in a Google storage cluster. We also show that these codes are easy to deploy in a manner that improves their robustness to common maintenance events. Along the way, we also give a remarkably simple and novel construction of distance-optimal LRCs (other constructions are also known), which may be of interest to theory-minded readers.

宽局部可恢复码(lrc)的实际设计考虑
大规模存储集群中的大部分数据都是擦除编码。在百亿亿级,优化擦除码以实现低存储开销、高效重构和易于部署是至关重要的。本地可恢复代码(lrc)在这个领域理所当然地获得了中心重要性,因为它们可以平衡许多这些需求。在我们的工作中,我们研究了广泛的lrc;每个条带具有大量块和低存储开销的lrc。这些代码是从业者解锁更高存储节省的自然下一步,但它们也带来了自己的挑战。特别令人感兴趣的是它们的可靠性,因为更宽的条纹更容易同时发生故障。我们对几种流行的和新颖的lrc进行了实际的分析。我们发现,宽LRC可靠性是一种微妙的现象,它对几种设计选择很敏感,其中一些被理论家忽视,而另一些被实践者忽视。基于这些见解,我们构建了一种新的LRC,称为统一柯西LRC,它在模拟中表现出优异的性能,在谷歌存储集群中部署的广泛LRC观察到的不可用事件的可靠性提高了33%。我们还展示了这些代码很容易部署,从而提高了它们对常见维护事件的健壮性。在此过程中,我们还给出了距离最优lrc的一个非常简单和新颖的结构(其他结构也已知),这可能会引起有理论头脑的读者的兴趣。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACM Transactions on Storage
ACM Transactions on Storage COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-COMPUTER SCIENCE, SOFTWARE ENGINEERING
CiteScore
4.20
自引率
5.90%
发文量
33
审稿时长
>12 weeks
期刊介绍: The ACM Transactions on Storage (TOS) is a new journal with an intent to publish original archival papers in the area of storage and closely related disciplines. Articles that appear in TOS will tend either to present new techniques and concepts or to report novel experiences and experiments with practical systems. Storage is a broad and multidisciplinary area that comprises of network protocols, resource management, data backup, replication, recovery, devices, security, and theory of data coding, densities, and low-power. Potential synergies among these fields are expected to open up new research directions.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信