RMD:基于相似性和合并的高性能重复数据删除方法

Panfeng Zhang, Ping Huang, Xubin He, Hua Wang, Lingyu Yan, Ke Zhou
{"title":"RMD:基于相似性和合并的高性能重复数据删除方法","authors":"Panfeng Zhang, Ping Huang, Xubin He, Hua Wang, Lingyu Yan, Ke Zhou","doi":"10.1109/ICPP.2016.68","DOIUrl":null,"url":null,"abstract":"Data deduplication, a data redundancy elimination technique, has been employed in almost all kinds of application environments to reduce storage space. However, one of the main challenges facing deduplication technology is to provide a fast key-value fingerprint index for large datasets, as the index performance is critical to the overall deduplication performance. This paper proposes RMD, a resemblance and mergence based deduplication scheme, which aims to provide quick responses to fingerprint queries. The key idea of RMD is to leverage a bloom filter array and the data resemblance algorithm to dramatically reduce the query range for deduplication. Moreover, RMD utilizes mergence based approach to merge resemblance segments to relevant bins, and exploits frequency-based Fingerprint Retention Policy to reduce the bin capacity to improve query throughput and improve data deduplication ratio. Extensive experimental results with real-world datasets have shown that RMD is able to achieve pretty high query performance and outperforms several state-of-the-art deduplication schemes.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"RMD: A Resemblance and Mergence Based Approach for High Performance Deduplication\",\"authors\":\"Panfeng Zhang, Ping Huang, Xubin He, Hua Wang, Lingyu Yan, Ke Zhou\",\"doi\":\"10.1109/ICPP.2016.68\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data deduplication, a data redundancy elimination technique, has been employed in almost all kinds of application environments to reduce storage space. However, one of the main challenges facing deduplication technology is to provide a fast key-value fingerprint index for large datasets, as the index performance is critical to the overall deduplication performance. This paper proposes RMD, a resemblance and mergence based deduplication scheme, which aims to provide quick responses to fingerprint queries. The key idea of RMD is to leverage a bloom filter array and the data resemblance algorithm to dramatically reduce the query range for deduplication. Moreover, RMD utilizes mergence based approach to merge resemblance segments to relevant bins, and exploits frequency-based Fingerprint Retention Policy to reduce the bin capacity to improve query throughput and improve data deduplication ratio. Extensive experimental results with real-world datasets have shown that RMD is able to achieve pretty high query performance and outperforms several state-of-the-art deduplication schemes.\",\"PeriodicalId\":409991,\"journal\":{\"name\":\"2016 45th International Conference on Parallel Processing (ICPP)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 45th International Conference on Parallel Processing (ICPP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP.2016.68\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 45th International Conference on Parallel Processing (ICPP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2016.68","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

重复数据删除是一种消除数据冗余的技术,已被广泛应用于各种应用环境中,以减少存储空间。然而,重复数据删除技术面临的主要挑战之一是为大型数据集提供快速的键值指纹索引,因为索引性能对整体重复数据删除性能至关重要。本文提出了一种基于相似性和合并的重复数据删除方案RMD,该方案旨在对指纹查询提供快速响应。RMD的关键思想是利用bloom过滤器阵列和数据相似性算法来显著减少重复数据删除的查询范围。此外,RMD利用基于合并的方法将相似段合并到相关的bin中,并利用基于频率的指纹保留策略来减少bin容量,从而提高查询吞吐量和重复数据删除率。使用真实数据集的大量实验结果表明,RMD能够实现相当高的查询性能,并且优于几种最先进的重复数据删除方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
RMD: A Resemblance and Mergence Based Approach for High Performance Deduplication
Data deduplication, a data redundancy elimination technique, has been employed in almost all kinds of application environments to reduce storage space. However, one of the main challenges facing deduplication technology is to provide a fast key-value fingerprint index for large datasets, as the index performance is critical to the overall deduplication performance. This paper proposes RMD, a resemblance and mergence based deduplication scheme, which aims to provide quick responses to fingerprint queries. The key idea of RMD is to leverage a bloom filter array and the data resemblance algorithm to dramatically reduce the query range for deduplication. Moreover, RMD utilizes mergence based approach to merge resemblance segments to relevant bins, and exploits frequency-based Fingerprint Retention Policy to reduce the bin capacity to improve query throughput and improve data deduplication ratio. Extensive experimental results with real-world datasets have shown that RMD is able to achieve pretty high query performance and outperforms several state-of-the-art deduplication schemes.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信