{"title":"基于双侧固定大小分块算法的重复数据删除改进方案","authors":"P. K. Krishnaprasad, Biju Abraham Narayamparambil","doi":"10.1109/ICACC.2013.10","DOIUrl":null,"url":null,"abstract":"DeDuplication is the technique of data reduction by breaking streams of data down into very granular components, and storing only the first instance of data items on the destination media and all the other similar occurrences to an index. Hash values are computed to identify the similar data items. Fixed size chunking (FSC) is a DeDuplication algorithm which breaks the data into fixed size chunks or blocks from the beginning of the file. But the main disadvantage of this technique is that, if new chunks are added in front or in the middle of a file, remaining chunks will get shifted from its initial position. This will yields a new hash value to the resulting chunks and thereby less DeDuplication ratio. But we can overcome this drawback by calculating hash values of chunks from the beginning as well as from the end of file and storing both values to metadata table. A new algorithm 'Dual Side Fixed Size Chunking' is proposed to get the high DeDuplication ratio over existing FSC. Without using computationally expensive Variable size chunking or content defined chunking, this algorithm can be effectively used for video or audio files to achieve a better DeDuplication ratio. This data reduction will provide network bandwidth savings and the ability to store more data on a given amount of disk or cloud storage. Reduced storage requirements will result in lower storage management and energy costs.","PeriodicalId":109537,"journal":{"name":"2013 Third International Conference on Advances in Computing and Communications","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Proposal for Improving Data Deduplication with Dual Side Fixed Size Chunking Algorithm\",\"authors\":\"P. K. Krishnaprasad, Biju Abraham Narayamparambil\",\"doi\":\"10.1109/ICACC.2013.10\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"DeDuplication is the technique of data reduction by breaking streams of data down into very granular components, and storing only the first instance of data items on the destination media and all the other similar occurrences to an index. Hash values are computed to identify the similar data items. Fixed size chunking (FSC) is a DeDuplication algorithm which breaks the data into fixed size chunks or blocks from the beginning of the file. But the main disadvantage of this technique is that, if new chunks are added in front or in the middle of a file, remaining chunks will get shifted from its initial position. This will yields a new hash value to the resulting chunks and thereby less DeDuplication ratio. But we can overcome this drawback by calculating hash values of chunks from the beginning as well as from the end of file and storing both values to metadata table. A new algorithm 'Dual Side Fixed Size Chunking' is proposed to get the high DeDuplication ratio over existing FSC. Without using computationally expensive Variable size chunking or content defined chunking, this algorithm can be effectively used for video or audio files to achieve a better DeDuplication ratio. This data reduction will provide network bandwidth savings and the ability to store more data on a given amount of disk or cloud storage. Reduced storage requirements will result in lower storage management and energy costs.\",\"PeriodicalId\":109537,\"journal\":{\"name\":\"2013 Third International Conference on Advances in Computing and Communications\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 Third International Conference on Advances in Computing and Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACC.2013.10\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Third International Conference on Advances in Computing and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACC.2013.10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Proposal for Improving Data Deduplication with Dual Side Fixed Size Chunking Algorithm
DeDuplication is the technique of data reduction by breaking streams of data down into very granular components, and storing only the first instance of data items on the destination media and all the other similar occurrences to an index. Hash values are computed to identify the similar data items. Fixed size chunking (FSC) is a DeDuplication algorithm which breaks the data into fixed size chunks or blocks from the beginning of the file. But the main disadvantage of this technique is that, if new chunks are added in front or in the middle of a file, remaining chunks will get shifted from its initial position. This will yields a new hash value to the resulting chunks and thereby less DeDuplication ratio. But we can overcome this drawback by calculating hash values of chunks from the beginning as well as from the end of file and storing both values to metadata table. A new algorithm 'Dual Side Fixed Size Chunking' is proposed to get the high DeDuplication ratio over existing FSC. Without using computationally expensive Variable size chunking or content defined chunking, this algorithm can be effectively used for video or audio files to achieve a better DeDuplication ratio. This data reduction will provide network bandwidth savings and the ability to store more data on a given amount of disk or cloud storage. Reduced storage requirements will result in lower storage management and energy costs.