{"title":"CoRec: A Cooperative Reconstruction Pattern for Multiple Failures in Erasure-Coded Storage Clusters","authors":"Jianzhong Huang, Er-wei Dai, C. Xie, X. Qin","doi":"10.1109/ICPP.2015.56","DOIUrl":null,"url":null,"abstract":"It is indispensable to speed up a reconstruction process in erasure-coded storage clusters, because a fast data recovery helps to shorten the vulnerability window while improving storage system reliability. To address double- and multiple-node failures, this paper proposes a cooperative reconstruction pattern - CoRec - to minimize reconstruction traffic. CoRec not only enables all rebuilding nodes to collaboratively reconstruct failed blocks but also limits each surviving block to be transferred over network only once. To clarify two CoRec based reconstruction schemes (i.e., CoRec-rn and CoRec-sn), we investigate two alternative reconstruction schemes (i.e., CRec and DRec). We develop reconstruction-time models, which are validated using empirical data, to estimate reconstruction performance of large-scale storage clusters and to pinpoint performance bottlenecks in the reconstruction process. We implement a proof-of-concept prototype where the four reconstruction schemes are quantitatively evaluated. Experimental results show that CoRec-rn and CoRec-sn significantly reduce the reconstruction time of CRec and DRec. In a real-world 9-node storage cluster, CoRec-rn speeds up the double-node reconstruction of CRec and DRec by a factor of at least 1.72, CoRec-sn accelerates the double-node reconstruction of CRec and DRec by a factor of at least 4.76.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"32 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 44th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2015.56","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
It is indispensable to speed up a reconstruction process in erasure-coded storage clusters, because a fast data recovery helps to shorten the vulnerability window while improving storage system reliability. To address double- and multiple-node failures, this paper proposes a cooperative reconstruction pattern - CoRec - to minimize reconstruction traffic. CoRec not only enables all rebuilding nodes to collaboratively reconstruct failed blocks but also limits each surviving block to be transferred over network only once. To clarify two CoRec based reconstruction schemes (i.e., CoRec-rn and CoRec-sn), we investigate two alternative reconstruction schemes (i.e., CRec and DRec). We develop reconstruction-time models, which are validated using empirical data, to estimate reconstruction performance of large-scale storage clusters and to pinpoint performance bottlenecks in the reconstruction process. We implement a proof-of-concept prototype where the four reconstruction schemes are quantitatively evaluated. Experimental results show that CoRec-rn and CoRec-sn significantly reduce the reconstruction time of CRec and DRec. In a real-world 9-node storage cluster, CoRec-rn speeds up the double-node reconstruction of CRec and DRec by a factor of at least 1.72, CoRec-sn accelerates the double-node reconstruction of CRec and DRec by a factor of at least 4.76.