P. Subedi, Ping Huang, Tong Liu, Joseph Moore, S. Skelton, Xubin He
{"title":"CoARC: Co-operative, Aggressive Recovery and Caching for Failures in Erasure Coded Hadoop","authors":"P. Subedi, Ping Huang, Tong Liu, Joseph Moore, S. Skelton, Xubin He","doi":"10.1109/ICPP.2016.40","DOIUrl":null,"url":null,"abstract":"Cloud file systems like Hadoop have become a norm for handling big data because of the easy scaling and distributed storage layout. However, these systems are susceptible to failures and data needs to be recovered when a failure is detected. During temporary failures, MapReduce jobs or file system clients perform degraded reads and satisfy the read request. We argue that lack of sharing of the recovered data during degraded reads and recovery of only the requested data block places a heavy strain on the system's network resources and increases the job execution time. To this end, we propose CoARC (Co-operative, Aggressive Recovery and Caching), which is a new data-recovery mechanism for unavailable data during degraded reads in distributed file systems. The main idea is to recover not only the data block that was requested but also other temporarily unavailable blocks in the same strip and cache them in a separate data node. We also propose an LRF (Least Recently Failed) cache replacement algorithm for such a kind of recovery caches. We also show that CoARC significantly reduces the network usage and job runtime in erasure coded Hadoop.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 45th International Conference on Parallel Processing (ICPP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2016.40","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Cloud file systems like Hadoop have become a norm for handling big data because of the easy scaling and distributed storage layout. However, these systems are susceptible to failures and data needs to be recovered when a failure is detected. During temporary failures, MapReduce jobs or file system clients perform degraded reads and satisfy the read request. We argue that lack of sharing of the recovered data during degraded reads and recovery of only the requested data block places a heavy strain on the system's network resources and increases the job execution time. To this end, we propose CoARC (Co-operative, Aggressive Recovery and Caching), which is a new data-recovery mechanism for unavailable data during degraded reads in distributed file systems. The main idea is to recover not only the data block that was requested but also other temporarily unavailable blocks in the same strip and cache them in a separate data node. We also propose an LRF (Least Recently Failed) cache replacement algorithm for such a kind of recovery caches. We also show that CoARC significantly reduces the network usage and job runtime in erasure coded Hadoop.