{"title":"增强异构擦除编码集群的跨机架多条修复","authors":"H. Zhou, D. Feng","doi":"10.1145/3545008.3545029","DOIUrl":null,"url":null,"abstract":"Large-scale distributed storage systems have introduced erasure code to guarantee high data reliability, yet inevitably at the expense of high repair costs. In practice, storage nodes are usually divided into different racks, and data blocks in storage nodes are often organized into multiple stripes independently manipulated by erasure code. Due to the scarcity and heterogeneity of the cross-rack bandwidth, the cross-rack network transmission dominates the entire repair costs. We argue that when erasure code is deployed in a rack architecture, existing repair techniques are limited in different aspects: neglecting the heterogeneous cross-rack bandwidth, less consideration for multi-stripe failure, no special treatment on repair link scheduling, and only targeting specific erasure code constructions. In this paper, we present CMRepair, an efficient Cross-rack Multi-stripe Repair technique that aims to reduce the repair time for multi-stripes failure repair in heterogeneous erasure-coded clusters. CMRepair carefully chooses the nodes for reading/repairing blocks and greedily searches for the near-optimal multi-stripe repair solution that reduces the cross-rack repair time while only introducing negligible computational overhead. Furthermore, it selectively schedules the execution orders of cross-rack links, with the primary objective of saturating the unused upload/download bandwidth resources and avoiding network congestion. CMRepair can also be extended to tackle full-node repair, multi-failure repair, and adapt to different erasure codes. Experiments show that CMRepair can reduce 6.42%-62.50% of the cross-rack repair time and improve 24.94%-53.91% of the repair throughput.","PeriodicalId":360504,"journal":{"name":"Proceedings of the 51st International Conference on Parallel Processing","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Boosting Cross-rack Multi-stripe Repair in Heterogeneous Erasure-coded Clusters\",\"authors\":\"H. Zhou, D. Feng\",\"doi\":\"10.1145/3545008.3545029\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large-scale distributed storage systems have introduced erasure code to guarantee high data reliability, yet inevitably at the expense of high repair costs. In practice, storage nodes are usually divided into different racks, and data blocks in storage nodes are often organized into multiple stripes independently manipulated by erasure code. Due to the scarcity and heterogeneity of the cross-rack bandwidth, the cross-rack network transmission dominates the entire repair costs. We argue that when erasure code is deployed in a rack architecture, existing repair techniques are limited in different aspects: neglecting the heterogeneous cross-rack bandwidth, less consideration for multi-stripe failure, no special treatment on repair link scheduling, and only targeting specific erasure code constructions. In this paper, we present CMRepair, an efficient Cross-rack Multi-stripe Repair technique that aims to reduce the repair time for multi-stripes failure repair in heterogeneous erasure-coded clusters. CMRepair carefully chooses the nodes for reading/repairing blocks and greedily searches for the near-optimal multi-stripe repair solution that reduces the cross-rack repair time while only introducing negligible computational overhead. Furthermore, it selectively schedules the execution orders of cross-rack links, with the primary objective of saturating the unused upload/download bandwidth resources and avoiding network congestion. CMRepair can also be extended to tackle full-node repair, multi-failure repair, and adapt to different erasure codes. Experiments show that CMRepair can reduce 6.42%-62.50% of the cross-rack repair time and improve 24.94%-53.91% of the repair throughput.\",\"PeriodicalId\":360504,\"journal\":{\"name\":\"Proceedings of the 51st International Conference on Parallel Processing\",\"volume\":\"136 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 51st International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3545008.3545029\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3545008.3545029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Boosting Cross-rack Multi-stripe Repair in Heterogeneous Erasure-coded Clusters
Large-scale distributed storage systems have introduced erasure code to guarantee high data reliability, yet inevitably at the expense of high repair costs. In practice, storage nodes are usually divided into different racks, and data blocks in storage nodes are often organized into multiple stripes independently manipulated by erasure code. Due to the scarcity and heterogeneity of the cross-rack bandwidth, the cross-rack network transmission dominates the entire repair costs. We argue that when erasure code is deployed in a rack architecture, existing repair techniques are limited in different aspects: neglecting the heterogeneous cross-rack bandwidth, less consideration for multi-stripe failure, no special treatment on repair link scheduling, and only targeting specific erasure code constructions. In this paper, we present CMRepair, an efficient Cross-rack Multi-stripe Repair technique that aims to reduce the repair time for multi-stripes failure repair in heterogeneous erasure-coded clusters. CMRepair carefully chooses the nodes for reading/repairing blocks and greedily searches for the near-optimal multi-stripe repair solution that reduces the cross-rack repair time while only introducing negligible computational overhead. Furthermore, it selectively schedules the execution orders of cross-rack links, with the primary objective of saturating the unused upload/download bandwidth resources and avoiding network congestion. CMRepair can also be extended to tackle full-node repair, multi-failure repair, and adapt to different erasure codes. Experiments show that CMRepair can reduce 6.42%-62.50% of the cross-rack repair time and improve 24.94%-53.91% of the repair throughput.