{"title":"Clustered Distributed Data Storage Repairing Multiple Failures.","authors":"Shiqiu Liu, Fangwei Ye, Qihui Wu","doi":"10.3390/e27030313","DOIUrl":null,"url":null,"abstract":"<p><p>A clustered distributed storage system (DSS), also called a rack-aware storage system, is a distributed storage system in which the nodes are grouped into several clusters. The communication between two clusters may be restricted by their connectivity; that is to say, the communication cost between nodes differs depending on their location. As such, when repairing a failed node, downloading data from nodes that are in the same cluster is much cheaper and more efficient than downloading data from nodes in another cluster. In this article, we consider a scenario in which the failed nodes only download data from nodes in the same cluster, which is an extreme and important case that leverages the fact that the intra-cluster bandwidth is much cheaper than the cross-cluster repair bandwidth. Also, we study the problem of repairing multiple failures in this article, which allows for collaboration within the same cluster, i.e., failed nodes in the same cluster can exchange data with each other. We derive the trade-off between the storage and repair bandwidth for the clustered DSSs and provide explicit code constructions achieving two extreme points in the trade-off, namely the minimum storage clustered collaborative repair (MSCCR) point and the minimum bandwidth clustered collaborative repair (MBCCR) point, respectively.</p>","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"27 3","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11941202/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e27030313","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
Clustered Distributed Data Storage Repairing Multiple Failures.
A clustered distributed storage system (DSS), also called a rack-aware storage system, is a distributed storage system in which the nodes are grouped into several clusters. The communication between two clusters may be restricted by their connectivity; that is to say, the communication cost between nodes differs depending on their location. As such, when repairing a failed node, downloading data from nodes that are in the same cluster is much cheaper and more efficient than downloading data from nodes in another cluster. In this article, we consider a scenario in which the failed nodes only download data from nodes in the same cluster, which is an extreme and important case that leverages the fact that the intra-cluster bandwidth is much cheaper than the cross-cluster repair bandwidth. Also, we study the problem of repairing multiple failures in this article, which allows for collaboration within the same cluster, i.e., failed nodes in the same cluster can exchange data with each other. We derive the trade-off between the storage and repair bandwidth for the clustered DSSs and provide explicit code constructions achieving two extreme points in the trade-off, namely the minimum storage clustered collaborative repair (MSCCR) point and the minimum bandwidth clustered collaborative repair (MBCCR) point, respectively.
期刊介绍:
Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.