{"title":"局部可修复代码中的最佳修复和负载平衡:设计与评估","authors":"Ximeng Chen , Si Wu , Hao Zhao , Yinlong Xu","doi":"10.1016/j.future.2025.108113","DOIUrl":null,"url":null,"abstract":"<div><div>Erasure coding is increasingly deployed in modern clustered storage systems to provide low-cost reliable storage. In particular, Locally Repairable Codes (LRCs) are a popular family of repair-efficient erasure codes that receive wide deployment in practice. In this paper, we analyze the storage process formulated as a data partitioning phase plus a node selection phase for LRCs in clustered storage systems. We show that the conventional flat partitioning and random partitioning incur significant cross-cluster repair traffic, while the random node selection causes storage and network imbalance. To this end, we design a new storage scheme composed of an optimal partitioning strategy and an enhanced node selection strategy for LRCs. Our partitioning strategy minimizes the cross-cluster repair traffic by dividing each group of blocks into the minimum number of clusters and further compactly placing the blocks. Our node selection strategy improves load balance by choosing less-loaded clusters and nodes to store blocks with potential higher access frequency at higher priority. To accommodate access fluctuations, we enhance our storage scheme with a rebalancing strategy that restores storage and network balance at both the cluster and node levels. We implement our storage scheme on a key-value store prototype atop Memcached. Evaluation on a LAN testbed shows that our scheme greatly improves the repair performance and load balance ratio compared to the baseline.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"175 ","pages":"Article 108113"},"PeriodicalIF":6.2000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimal repair and load balance in locally repairable codes: Design and evaluation\",\"authors\":\"Ximeng Chen , Si Wu , Hao Zhao , Yinlong Xu\",\"doi\":\"10.1016/j.future.2025.108113\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Erasure coding is increasingly deployed in modern clustered storage systems to provide low-cost reliable storage. In particular, Locally Repairable Codes (LRCs) are a popular family of repair-efficient erasure codes that receive wide deployment in practice. In this paper, we analyze the storage process formulated as a data partitioning phase plus a node selection phase for LRCs in clustered storage systems. We show that the conventional flat partitioning and random partitioning incur significant cross-cluster repair traffic, while the random node selection causes storage and network imbalance. To this end, we design a new storage scheme composed of an optimal partitioning strategy and an enhanced node selection strategy for LRCs. Our partitioning strategy minimizes the cross-cluster repair traffic by dividing each group of blocks into the minimum number of clusters and further compactly placing the blocks. Our node selection strategy improves load balance by choosing less-loaded clusters and nodes to store blocks with potential higher access frequency at higher priority. To accommodate access fluctuations, we enhance our storage scheme with a rebalancing strategy that restores storage and network balance at both the cluster and node levels. We implement our storage scheme on a key-value store prototype atop Memcached. Evaluation on a LAN testbed shows that our scheme greatly improves the repair performance and load balance ratio compared to the baseline.</div></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":\"175 \",\"pages\":\"Article 108113\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X25004078\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25004078","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Optimal repair and load balance in locally repairable codes: Design and evaluation
Erasure coding is increasingly deployed in modern clustered storage systems to provide low-cost reliable storage. In particular, Locally Repairable Codes (LRCs) are a popular family of repair-efficient erasure codes that receive wide deployment in practice. In this paper, we analyze the storage process formulated as a data partitioning phase plus a node selection phase for LRCs in clustered storage systems. We show that the conventional flat partitioning and random partitioning incur significant cross-cluster repair traffic, while the random node selection causes storage and network imbalance. To this end, we design a new storage scheme composed of an optimal partitioning strategy and an enhanced node selection strategy for LRCs. Our partitioning strategy minimizes the cross-cluster repair traffic by dividing each group of blocks into the minimum number of clusters and further compactly placing the blocks. Our node selection strategy improves load balance by choosing less-loaded clusters and nodes to store blocks with potential higher access frequency at higher priority. To accommodate access fluctuations, we enhance our storage scheme with a rebalancing strategy that restores storage and network balance at both the cluster and node levels. We implement our storage scheme on a key-value store prototype atop Memcached. Evaluation on a LAN testbed shows that our scheme greatly improves the repair performance and load balance ratio compared to the baseline.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.