利用贪婪重写方案提高重删系统的恢复性能

Lifang Lin, Yuhui Deng, Yi Zhou
{"title":"利用贪婪重写方案提高重删系统的恢复性能","authors":"Lifang Lin, Yuhui Deng, Yi Zhou","doi":"10.1109/ICPADS53394.2021.00042","DOIUrl":null,"url":null,"abstract":"Data deduplication has been widely used to improve storage space utilization, however, it is baffled by data fragmen-tation: logically consecutive chunks physically scattered across various containers. Many rewriting schemes, rewriting fragment-ed duplicate chunks into new containers, attempt to alleviate the restore performance degradation caused by fragmentation. Unfortunately, these schemes rely on a fixed threshold and fail to choose the appropriate set of old containers for rewriting, which leads to substantial redundant chunks existing in the retrieved containers when restoring backups. To address this issue, we propose a flexible threshold rewriting scheme to improve restore performance while maintaining high backup performance. We define an effectiveness metric - valid container reference counts (VCRC) - that facilitates identifying the appropriate containers for rewriting. We design a greedy-algorithm-based algorithm called F-greedy that dynamically adjusts the threshold according to the distribution of containers' VCRC, aiming to rewrite low-VCRC containers. We quantitatively evaluate F-greedy on three real-world backup datasets in terms of restore performance, backup performance, and storage overhead. The empirical results show that compared with two state-of-the-art schemes (Capping and SMR), our scheme improves the restore speed of the exiting algorithms by 1.3x - 2.4x while achieving similar backup performance.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"204 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Improving Restore Performance of Deduplication Systems via a Greedy Rewriting Scheme\",\"authors\":\"Lifang Lin, Yuhui Deng, Yi Zhou\",\"doi\":\"10.1109/ICPADS53394.2021.00042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data deduplication has been widely used to improve storage space utilization, however, it is baffled by data fragmen-tation: logically consecutive chunks physically scattered across various containers. Many rewriting schemes, rewriting fragment-ed duplicate chunks into new containers, attempt to alleviate the restore performance degradation caused by fragmentation. Unfortunately, these schemes rely on a fixed threshold and fail to choose the appropriate set of old containers for rewriting, which leads to substantial redundant chunks existing in the retrieved containers when restoring backups. To address this issue, we propose a flexible threshold rewriting scheme to improve restore performance while maintaining high backup performance. We define an effectiveness metric - valid container reference counts (VCRC) - that facilitates identifying the appropriate containers for rewriting. We design a greedy-algorithm-based algorithm called F-greedy that dynamically adjusts the threshold according to the distribution of containers' VCRC, aiming to rewrite low-VCRC containers. We quantitatively evaluate F-greedy on three real-world backup datasets in terms of restore performance, backup performance, and storage overhead. The empirical results show that compared with two state-of-the-art schemes (Capping and SMR), our scheme improves the restore speed of the exiting algorithms by 1.3x - 2.4x while achieving similar backup performance.\",\"PeriodicalId\":309508,\"journal\":{\"name\":\"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)\",\"volume\":\"204 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPADS53394.2021.00042\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS53394.2021.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

重复数据删除已被广泛用于提高存储空间利用率,然而,它受到数据碎片的困扰:逻辑上连续的块物理上分散在不同的容器中。许多重写方案(将碎片化的重复块重写到新的容器中)都试图减轻碎片造成的恢复性能下降。不幸的是,这些方案依赖于一个固定的阈值,并且不能选择适当的旧容器集进行重写,这导致在恢复备份时检索到的容器中存在大量冗余块。为了解决这个问题,我们提出了一种灵活的阈值重写方案,以提高恢复性能,同时保持较高的备份性能。我们定义了一个有效性度量——有效容器引用计数(VCRC)——它有助于识别要重写的适当容器。我们设计了一种基于贪婪算法的F-greedy算法,根据容器的VCRC分布动态调整阈值,旨在重写低VCRC的容器。我们从恢复性能、备份性能和存储开销方面定量地评估了三个真实备份数据集上的F-greedy。实证结果表明,与两种最先进的方案(Capping和SMR)相比,我们的方案在达到相似的备份性能的同时,将现有算法的恢复速度提高了1.3 - 2.4倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Improving Restore Performance of Deduplication Systems via a Greedy Rewriting Scheme
Data deduplication has been widely used to improve storage space utilization, however, it is baffled by data fragmen-tation: logically consecutive chunks physically scattered across various containers. Many rewriting schemes, rewriting fragment-ed duplicate chunks into new containers, attempt to alleviate the restore performance degradation caused by fragmentation. Unfortunately, these schemes rely on a fixed threshold and fail to choose the appropriate set of old containers for rewriting, which leads to substantial redundant chunks existing in the retrieved containers when restoring backups. To address this issue, we propose a flexible threshold rewriting scheme to improve restore performance while maintaining high backup performance. We define an effectiveness metric - valid container reference counts (VCRC) - that facilitates identifying the appropriate containers for rewriting. We design a greedy-algorithm-based algorithm called F-greedy that dynamically adjusts the threshold according to the distribution of containers' VCRC, aiming to rewrite low-VCRC containers. We quantitatively evaluate F-greedy on three real-world backup datasets in terms of restore performance, backup performance, and storage overhead. The empirical results show that compared with two state-of-the-art schemes (Capping and SMR), our scheme improves the restore speed of the exiting algorithms by 1.3x - 2.4x while achieving similar backup performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信