{"title":"Fast recovery for large disk enclosures based on RAID2.0: Algorithms and evaluation","authors":"Qiliang Li , Min Lyu , Liangliang Xu , Yinlong Xu","doi":"10.1016/j.jpdc.2024.104854","DOIUrl":null,"url":null,"abstract":"<div><p>The RAID2.0 architecture, which uses dozens or even hundreds of disks, is widely adopted for large-capacity data storage. However, limited resources like memory and CPU cause RAID2.0 to execute batch recovery for disk failures. The traditional random data placement and recovery schemes result in highly skewed I/O access within a batch, which slows down the recovery speed. To address this issue, we propose DR-RAID, an efficient reconstruction scheme that balances local rebuilding workloads across all surviving disks within a batch. We dynamically select a batch of tasks with almost balanced read loads and make intra-batch adjustments for tasks with multiple solutions of reading source chunks. Furthermore, we use a bipartite graph model to achieve a uniform distribution of write loads. DR-RAID can be applied with homogeneous or heterogeneous disk rebuilding bandwidth. Experimental results demonstrate that in offline rebuilding, DR-RAID enhances the rebuilding throughput by up to 61.90% compared to the random data placement scheme. With varied rebuilding bandwidth, the improvement can reach up to 65.00%.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Parallel and Distributed Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0743731524000182","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
The RAID2.0 architecture, which uses dozens or even hundreds of disks, is widely adopted for large-capacity data storage. However, limited resources like memory and CPU cause RAID2.0 to execute batch recovery for disk failures. The traditional random data placement and recovery schemes result in highly skewed I/O access within a batch, which slows down the recovery speed. To address this issue, we propose DR-RAID, an efficient reconstruction scheme that balances local rebuilding workloads across all surviving disks within a batch. We dynamically select a batch of tasks with almost balanced read loads and make intra-batch adjustments for tasks with multiple solutions of reading source chunks. Furthermore, we use a bipartite graph model to achieve a uniform distribution of write loads. DR-RAID can be applied with homogeneous or heterogeneous disk rebuilding bandwidth. Experimental results demonstrate that in offline rebuilding, DR-RAID enhances the rebuilding throughput by up to 61.90% compared to the random data placement scheme. With varied rebuilding bandwidth, the improvement can reach up to 65.00%.
期刊介绍:
This international journal is directed to researchers, engineers, educators, managers, programmers, and users of computers who have particular interests in parallel processing and/or distributed computing.
The Journal of Parallel and Distributed Computing publishes original research papers and timely review articles on the theory, design, evaluation, and use of parallel and/or distributed computing systems. The journal also features special issues on these topics; again covering the full range from the design to the use of our targeted systems.