DAM: A DataOwnership-Aware Multi-layered De-duplication Scheme

Yujuan Tan, D. Feng, Zhichao Yan, Guohui Zhou
{"title":"DAM: A DataOwnership-Aware Multi-layered De-duplication Scheme","authors":"Yujuan Tan, D. Feng, Zhichao Yan, Guohui Zhou","doi":"10.1109/NAS.2010.57","DOIUrl":null,"url":null,"abstract":"Beyond the storage savings brought by chunk-level de-duplication in backup and archiving systems, a prominent challenge facing this technology is how to efficiently and effectively identify the duplicate chunks. Most of the chunk fingerprints used to identify individual chunks are stored on disks due to the limited main memory capacity. Checking for chunk fingerprint match on disk for every input chunk is known to be a severe performance bottleneck for the backup process. On the other hand, our intuitions and analyses of real backup data both indicate that duplicate chunks tend to strongly concentrate according to the data ownership. Motivated by this observation and to avoid or alleviate the aforementioned backup performance bottleneck, we propose DAM, a dataownership-aware multi-layered de-duplication scheme that exploits the data chunks’ ownership and uses a tri-layered de-duplication approach to narrow the search space for duplicate chunks to reduce the total disk accesses. Our experimental results with real world datasets on DAM show it reduces the disk accesses by an average of 60.8% and shortens the de-duplication time by an average of 46.3%.","PeriodicalId":284549,"journal":{"name":"2010 IEEE Fifth International Conference on Networking, Architecture, and Storage","volume":"464 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE Fifth International Conference on Networking, Architecture, and Storage","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAS.2010.57","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Beyond the storage savings brought by chunk-level de-duplication in backup and archiving systems, a prominent challenge facing this technology is how to efficiently and effectively identify the duplicate chunks. Most of the chunk fingerprints used to identify individual chunks are stored on disks due to the limited main memory capacity. Checking for chunk fingerprint match on disk for every input chunk is known to be a severe performance bottleneck for the backup process. On the other hand, our intuitions and analyses of real backup data both indicate that duplicate chunks tend to strongly concentrate according to the data ownership. Motivated by this observation and to avoid or alleviate the aforementioned backup performance bottleneck, we propose DAM, a dataownership-aware multi-layered de-duplication scheme that exploits the data chunks’ ownership and uses a tri-layered de-duplication approach to narrow the search space for duplicate chunks to reduce the total disk accesses. Our experimental results with real world datasets on DAM show it reduces the disk accesses by an average of 60.8% and shortens the de-duplication time by an average of 46.3%.
DAM:一种数据所有权感知的多层重复数据删除方案
在备份和归档系统中,除了块级重复数据删除带来的存储节省之外,该技术面临的一个突出挑战是如何高效地识别重复的块。由于主内存容量有限,大多数用于识别单个块的块指纹都存储在磁盘上。对于备份过程来说,检查每个输入块在磁盘上的块指纹匹配是一个严重的性能瓶颈。另一方面,我们的直觉和对真实备份数据的分析都表明,根据数据所有权,重复块倾向于强烈集中。基于这一观察结果,为了避免或缓解上述备份性能瓶颈,我们提出了DAM,这是一种数据所有权感知的多层重复数据删除方案,它利用数据块的所有权,并使用三层重复数据删除方法来缩小重复数据块的搜索空间,以减少磁盘访问总量。我们在DAM上对真实数据集的实验结果表明,它平均减少了60.8%的磁盘访问,平均缩短了46.3%的重复数据删除时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信