{"title":"利用贪婪重写方案提高重删系统的恢复性能","authors":"Lifang Lin, Yuhui Deng, Yi Zhou","doi":"10.1109/ICPADS53394.2021.00042","DOIUrl":null,"url":null,"abstract":"Data deduplication has been widely used to improve storage space utilization, however, it is baffled by data fragmen-tation: logically consecutive chunks physically scattered across various containers. Many rewriting schemes, rewriting fragment-ed duplicate chunks into new containers, attempt to alleviate the restore performance degradation caused by fragmentation. Unfortunately, these schemes rely on a fixed threshold and fail to choose the appropriate set of old containers for rewriting, which leads to substantial redundant chunks existing in the retrieved containers when restoring backups. To address this issue, we propose a flexible threshold rewriting scheme to improve restore performance while maintaining high backup performance. We define an effectiveness metric - valid container reference counts (VCRC) - that facilitates identifying the appropriate containers for rewriting. We design a greedy-algorithm-based algorithm called F-greedy that dynamically adjusts the threshold according to the distribution of containers' VCRC, aiming to rewrite low-VCRC containers. We quantitatively evaluate F-greedy on three real-world backup datasets in terms of restore performance, backup performance, and storage overhead. The empirical results show that compared with two state-of-the-art schemes (Capping and SMR), our scheme improves the restore speed of the exiting algorithms by 1.3x - 2.4x while achieving similar backup performance.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"204 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Improving Restore Performance of Deduplication Systems via a Greedy Rewriting Scheme\",\"authors\":\"Lifang Lin, Yuhui Deng, Yi Zhou\",\"doi\":\"10.1109/ICPADS53394.2021.00042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data deduplication has been widely used to improve storage space utilization, however, it is baffled by data fragmen-tation: logically consecutive chunks physically scattered across various containers. Many rewriting schemes, rewriting fragment-ed duplicate chunks into new containers, attempt to alleviate the restore performance degradation caused by fragmentation. Unfortunately, these schemes rely on a fixed threshold and fail to choose the appropriate set of old containers for rewriting, which leads to substantial redundant chunks existing in the retrieved containers when restoring backups. To address this issue, we propose a flexible threshold rewriting scheme to improve restore performance while maintaining high backup performance. We define an effectiveness metric - valid container reference counts (VCRC) - that facilitates identifying the appropriate containers for rewriting. We design a greedy-algorithm-based algorithm called F-greedy that dynamically adjusts the threshold according to the distribution of containers' VCRC, aiming to rewrite low-VCRC containers. We quantitatively evaluate F-greedy on three real-world backup datasets in terms of restore performance, backup performance, and storage overhead. The empirical results show that compared with two state-of-the-art schemes (Capping and SMR), our scheme improves the restore speed of the exiting algorithms by 1.3x - 2.4x while achieving similar backup performance.\",\"PeriodicalId\":309508,\"journal\":{\"name\":\"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)\",\"volume\":\"204 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPADS53394.2021.00042\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS53394.2021.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving Restore Performance of Deduplication Systems via a Greedy Rewriting Scheme
Data deduplication has been widely used to improve storage space utilization, however, it is baffled by data fragmen-tation: logically consecutive chunks physically scattered across various containers. Many rewriting schemes, rewriting fragment-ed duplicate chunks into new containers, attempt to alleviate the restore performance degradation caused by fragmentation. Unfortunately, these schemes rely on a fixed threshold and fail to choose the appropriate set of old containers for rewriting, which leads to substantial redundant chunks existing in the retrieved containers when restoring backups. To address this issue, we propose a flexible threshold rewriting scheme to improve restore performance while maintaining high backup performance. We define an effectiveness metric - valid container reference counts (VCRC) - that facilitates identifying the appropriate containers for rewriting. We design a greedy-algorithm-based algorithm called F-greedy that dynamically adjusts the threshold according to the distribution of containers' VCRC, aiming to rewrite low-VCRC containers. We quantitatively evaluate F-greedy on three real-world backup datasets in terms of restore performance, backup performance, and storage overhead. The empirical results show that compared with two state-of-the-art schemes (Capping and SMR), our scheme improves the restore speed of the exiting algorithms by 1.3x - 2.4x while achieving similar backup performance.