基于双侧固定大小分块算法的重复数据删除改进方案

P. K. Krishnaprasad, Biju Abraham Narayamparambil
{"title":"基于双侧固定大小分块算法的重复数据删除改进方案","authors":"P. K. Krishnaprasad, Biju Abraham Narayamparambil","doi":"10.1109/ICACC.2013.10","DOIUrl":null,"url":null,"abstract":"DeDuplication is the technique of data reduction by breaking streams of data down into very granular components, and storing only the first instance of data items on the destination media and all the other similar occurrences to an index. Hash values are computed to identify the similar data items. Fixed size chunking (FSC) is a DeDuplication algorithm which breaks the data into fixed size chunks or blocks from the beginning of the file. But the main disadvantage of this technique is that, if new chunks are added in front or in the middle of a file, remaining chunks will get shifted from its initial position. This will yields a new hash value to the resulting chunks and thereby less DeDuplication ratio. But we can overcome this drawback by calculating hash values of chunks from the beginning as well as from the end of file and storing both values to metadata table. A new algorithm 'Dual Side Fixed Size Chunking' is proposed to get the high DeDuplication ratio over existing FSC. Without using computationally expensive Variable size chunking or content defined chunking, this algorithm can be effectively used for video or audio files to achieve a better DeDuplication ratio. This data reduction will provide network bandwidth savings and the ability to store more data on a given amount of disk or cloud storage. Reduced storage requirements will result in lower storage management and energy costs.","PeriodicalId":109537,"journal":{"name":"2013 Third International Conference on Advances in Computing and Communications","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Proposal for Improving Data Deduplication with Dual Side Fixed Size Chunking Algorithm\",\"authors\":\"P. K. Krishnaprasad, Biju Abraham Narayamparambil\",\"doi\":\"10.1109/ICACC.2013.10\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"DeDuplication is the technique of data reduction by breaking streams of data down into very granular components, and storing only the first instance of data items on the destination media and all the other similar occurrences to an index. Hash values are computed to identify the similar data items. Fixed size chunking (FSC) is a DeDuplication algorithm which breaks the data into fixed size chunks or blocks from the beginning of the file. But the main disadvantage of this technique is that, if new chunks are added in front or in the middle of a file, remaining chunks will get shifted from its initial position. This will yields a new hash value to the resulting chunks and thereby less DeDuplication ratio. But we can overcome this drawback by calculating hash values of chunks from the beginning as well as from the end of file and storing both values to metadata table. A new algorithm 'Dual Side Fixed Size Chunking' is proposed to get the high DeDuplication ratio over existing FSC. Without using computationally expensive Variable size chunking or content defined chunking, this algorithm can be effectively used for video or audio files to achieve a better DeDuplication ratio. This data reduction will provide network bandwidth savings and the ability to store more data on a given amount of disk or cloud storage. Reduced storage requirements will result in lower storage management and energy costs.\",\"PeriodicalId\":109537,\"journal\":{\"name\":\"2013 Third International Conference on Advances in Computing and Communications\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 Third International Conference on Advances in Computing and Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACC.2013.10\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Third International Conference on Advances in Computing and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACC.2013.10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

重复数据删除是一种数据缩减技术,它将数据流分解成非常细粒度的组件,并仅将数据项的第一个实例存储在目标媒体上,并将所有其他类似的数据项存储在索引中。计算散列值以识别相似的数据项。FSC (Fixed size chunking)是一种重复数据删除算法,它将数据从文件开始分割成固定大小的块或块。但是这种技术的主要缺点是,如果在文件前面或中间添加新的块,则剩余的块将从其初始位置移动。这将为生成的块生成一个新的哈希值,从而降低重复数据删除比率。但是我们可以通过从文件开头和末尾计算块的哈希值并将这两个值存储到元数据表中来克服这个缺点。为了获得比现有FSC更高的重复数据删除率,提出了一种新的“双侧固定大小分块”算法。该算法可以有效地用于视频或音频文件,而无需使用计算开销较大的可变大小分块或内容定义分块,从而获得更好的重复数据删除比率。这种数据减少将节省网络带宽,并能够在给定数量的磁盘或云存储上存储更多数据。减少存储需求将降低存储管理和能源成本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Proposal for Improving Data Deduplication with Dual Side Fixed Size Chunking Algorithm
DeDuplication is the technique of data reduction by breaking streams of data down into very granular components, and storing only the first instance of data items on the destination media and all the other similar occurrences to an index. Hash values are computed to identify the similar data items. Fixed size chunking (FSC) is a DeDuplication algorithm which breaks the data into fixed size chunks or blocks from the beginning of the file. But the main disadvantage of this technique is that, if new chunks are added in front or in the middle of a file, remaining chunks will get shifted from its initial position. This will yields a new hash value to the resulting chunks and thereby less DeDuplication ratio. But we can overcome this drawback by calculating hash values of chunks from the beginning as well as from the end of file and storing both values to metadata table. A new algorithm 'Dual Side Fixed Size Chunking' is proposed to get the high DeDuplication ratio over existing FSC. Without using computationally expensive Variable size chunking or content defined chunking, this algorithm can be effectively used for video or audio files to achieve a better DeDuplication ratio. This data reduction will provide network bandwidth savings and the ability to store more data on a given amount of disk or cloud storage. Reduced storage requirements will result in lower storage management and energy costs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信