Reducing Chunk Fragmentation for In-Line Delta Compressed and Deduplicated Backup Systems

Yucheng Zhang, D. Feng, Yu Hua, Yuchong Hu, Wen Xia, Min Fu, Xiaolan Tang, Zhikun Wang, Fangting Huang, Yukun Zhou
{"title":"Reducing Chunk Fragmentation for In-Line Delta Compressed and Deduplicated Backup Systems","authors":"Yucheng Zhang, D. Feng, Yu Hua, Yuchong Hu, Wen Xia, Min Fu, Xiaolan Tang, Zhikun Wang, Fangting Huang, Yukun Zhou","doi":"10.1109/NAS.2017.8026874","DOIUrl":null,"url":null,"abstract":"Chunk-level deduplication, while robust in removing duplicate chunks, introduces chunk fragmentation which decreases restore performance. Rewriting algorithms are proposed to reduce the chunk fragmentation and accelerate the restore speed. Delta compression can remove redundant data between non-duplicate but similar chunks which cannot be eliminated by chunk-level deduplication. Some applications use delta compression as a complement for chunk-level deduplication to attain extra space and bandwidth savings. However, we observe that delta compression introduces a new type of chunk fragmentation stemming from delta compressed chunks whose base chunks are fragmented. We refer to such delta compressed chunks as base-fragmented chunks. We found that this new type of chunk fragmentation has a more severely impact on the restore performance than the chunk fragmentation introduced by chunk-level deduplication and cannot be reduced by existing rewriting algorithms. In order to address the problem due to the base-fragmented chunks, we propose SDC, a scheme that selectively performs delta compression after chunk-level deduplication. The main idea behind SDC is to simulate a restore cache to identify the non-base-fragmented chunks and only perform delta compression for these chunks, thus avoiding the new type of chunk fragmentation. Due to the locality among the backup streams, most of the non-base-fragmented chunks can be detected by the simulated restore cache. Experimental results based on real-world datasets show that SDC improves the restore performance of the delta compressed and deduplicated backup system by 1.93X-7.48X, and achieves 95.5%-97.4% of its compression, while imposing negligible impact on the backup throughput.","PeriodicalId":222161,"journal":{"name":"2017 International Conference on Networking, Architecture, and Storage (NAS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Networking, Architecture, and Storage (NAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAS.2017.8026874","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Chunk-level deduplication, while robust in removing duplicate chunks, introduces chunk fragmentation which decreases restore performance. Rewriting algorithms are proposed to reduce the chunk fragmentation and accelerate the restore speed. Delta compression can remove redundant data between non-duplicate but similar chunks which cannot be eliminated by chunk-level deduplication. Some applications use delta compression as a complement for chunk-level deduplication to attain extra space and bandwidth savings. However, we observe that delta compression introduces a new type of chunk fragmentation stemming from delta compressed chunks whose base chunks are fragmented. We refer to such delta compressed chunks as base-fragmented chunks. We found that this new type of chunk fragmentation has a more severely impact on the restore performance than the chunk fragmentation introduced by chunk-level deduplication and cannot be reduced by existing rewriting algorithms. In order to address the problem due to the base-fragmented chunks, we propose SDC, a scheme that selectively performs delta compression after chunk-level deduplication. The main idea behind SDC is to simulate a restore cache to identify the non-base-fragmented chunks and only perform delta compression for these chunks, thus avoiding the new type of chunk fragmentation. Due to the locality among the backup streams, most of the non-base-fragmented chunks can be detected by the simulated restore cache. Experimental results based on real-world datasets show that SDC improves the restore performance of the delta compressed and deduplicated backup system by 1.93X-7.48X, and achieves 95.5%-97.4% of its compression, while imposing negligible impact on the backup throughput.
减少在线增量压缩和重复数据删除备份系统的块碎片
块级重复数据删除虽然在删除重复块方面具有鲁棒性,但会引入块碎片,从而降低恢复性能。为了减少数据块碎片,提高数据恢复速度,提出了改写算法。增量压缩可以去除非重复但相似的块之间的冗余数据,这些数据不能通过块级重复数据删除来消除。一些应用程序使用增量压缩作为块级重复数据删除的补充,以获得额外的空间和带宽节省。然而,我们观察到三角洲压缩引入了一种新的块碎片类型,源于三角洲压缩块,其基块是碎片化的。我们将这种增量压缩块称为碱基碎片块。我们发现,这种新型的块碎片对恢复性能的影响比块级重复数据删除引入的块碎片更严重,而且现有的重写算法无法降低这种影响。为了解决由基本碎片块引起的问题,我们提出了SDC,一种在块级重复数据删除后选择性执行增量压缩的方案。SDC背后的主要思想是模拟恢复缓存来识别非基本碎片块,并仅对这些块执行增量压缩,从而避免新型块碎片。由于备份流之间的局部性,模拟恢复缓存可以检测到大多数非基分片块。基于真实数据集的实验结果表明,SDC将增量压缩和重复数据删除备份系统的恢复性能提高了1.93X-7.48X,压缩率达到95.5%-97.4%,而对备份吞吐量的影响可以忽略不计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信