Ying Fang, Shuai Wang, Hai Tan, Xin Zhang, Jun Zhang
{"title":"CLRC:一个新的HDFS Erasure Code定位算法","authors":"Ying Fang, Shuai Wang, Hai Tan, Xin Zhang, Jun Zhang","doi":"10.1109/ICCEAI52939.2021.00012","DOIUrl":null,"url":null,"abstract":"With the continuous development of big data, the increase speed of hardware expansion used for HDFS has been far behind the volume of big data. As a data redundancy strategy, the traditional data replication strategy has been gradually replaced by Erasure Code due to its smaller redundancy rate and storage overhead. However, compared with replicas, Erasure Code needs to read a certain amount of data blocks during the process of data recovery, resulting in a large amount overhead of I/O and network. Based on the RS algorithm, a new CLRC algorithm is proposed to optimize the locality of RS algorithm by grouping RS coded blocks and generating local check blocks. Evaluations show that the algorithm can reduce about 61% bandwidth and I/O consumption during the process of data recovery when a single block is damaged. What's more, the cost of decoding time is only 59% of RS algorithm.","PeriodicalId":331409,"journal":{"name":"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"CLRC: a New Erasure Code Localization Algorithm for HDFS\",\"authors\":\"Ying Fang, Shuai Wang, Hai Tan, Xin Zhang, Jun Zhang\",\"doi\":\"10.1109/ICCEAI52939.2021.00012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the continuous development of big data, the increase speed of hardware expansion used for HDFS has been far behind the volume of big data. As a data redundancy strategy, the traditional data replication strategy has been gradually replaced by Erasure Code due to its smaller redundancy rate and storage overhead. However, compared with replicas, Erasure Code needs to read a certain amount of data blocks during the process of data recovery, resulting in a large amount overhead of I/O and network. Based on the RS algorithm, a new CLRC algorithm is proposed to optimize the locality of RS algorithm by grouping RS coded blocks and generating local check blocks. Evaluations show that the algorithm can reduce about 61% bandwidth and I/O consumption during the process of data recovery when a single block is damaged. What's more, the cost of decoding time is only 59% of RS algorithm.\",\"PeriodicalId\":331409,\"journal\":{\"name\":\"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCEAI52939.2021.00012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCEAI52939.2021.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
CLRC: a New Erasure Code Localization Algorithm for HDFS
With the continuous development of big data, the increase speed of hardware expansion used for HDFS has been far behind the volume of big data. As a data redundancy strategy, the traditional data replication strategy has been gradually replaced by Erasure Code due to its smaller redundancy rate and storage overhead. However, compared with replicas, Erasure Code needs to read a certain amount of data blocks during the process of data recovery, resulting in a large amount overhead of I/O and network. Based on the RS algorithm, a new CLRC algorithm is proposed to optimize the locality of RS algorithm by grouping RS coded blocks and generating local check blocks. Evaluations show that the algorithm can reduce about 61% bandwidth and I/O consumption during the process of data recovery when a single block is damaged. What's more, the cost of decoding time is only 59% of RS algorithm.