CombiHeader: Minimizing the number of shim headers in redundancy elimination systems

2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) Pub Date : 2011-04-10 DOI:10.1109/INFCOMW.2011.5928920

Sumanta Saha, Andrey Lukyanenko, Antti Yla-Jaaski

{"title":"CombiHeader: Minimizing the number of shim headers in redundancy elimination systems","authors":"Sumanta Saha, Andrey Lukyanenko, Antti Yla-Jaaski","doi":"10.1109/INFCOMW.2011.5928920","DOIUrl":null,"url":null,"abstract":"Redundancy elimination has been used in many places to improve network performance. The algorithms for doing this typically split data into chunks, fingerprint them, and compare the fingerprint with cache to identify similar chunks. Then these chunks are removed from the data and headers are inserted instead of them. However, this approach presents us with two crucial shortcomings. Depending on the size of chunks, either many headers need to be inserted, or probability of missing similar regions is increased. Algorithms that try to overcome missed similarity detection by expanding chunk boundary suffers from excessive memory access due to byte-by-byte comparison. This situation leads us to propose a novel algorithm, CombiHeader, that allows near maximum similarity detection using smaller chunks sizes while using chunk aggregation technique to transmit very few headers with few memory accesses. CombiHeader uses a specialized directed graph to track and merge adjacent popular chunks. By generating different generations of CombiNodes, CombiHeader can detect different lengths of similarity region, and uses the smallest number of headers possible. Experiments show that CombiHeader uses less than 25% headers than general elimination algorithms, and this number improves with the number of hits. The required memory access to detect maximal similarity region is in the range of 1%-5% of comparable algorithms for certain situations. CombiHeader is implemented as a pluggable module, which can be used with any existing redundancy elimination algorithm.","PeriodicalId":402219,"journal":{"name":"2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFCOMW.2011.5928920","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Redundancy elimination has been used in many places to improve network performance. The algorithms for doing this typically split data into chunks, fingerprint them, and compare the fingerprint with cache to identify similar chunks. Then these chunks are removed from the data and headers are inserted instead of them. However, this approach presents us with two crucial shortcomings. Depending on the size of chunks, either many headers need to be inserted, or probability of missing similar regions is increased. Algorithms that try to overcome missed similarity detection by expanding chunk boundary suffers from excessive memory access due to byte-by-byte comparison. This situation leads us to propose a novel algorithm, CombiHeader, that allows near maximum similarity detection using smaller chunks sizes while using chunk aggregation technique to transmit very few headers with few memory accesses. CombiHeader uses a specialized directed graph to track and merge adjacent popular chunks. By generating different generations of CombiNodes, CombiHeader can detect different lengths of similarity region, and uses the smallest number of headers possible. Experiments show that CombiHeader uses less than 25% headers than general elimination algorithms, and this number improves with the number of hits. The required memory access to detect maximal similarity region is in the range of 1%-5% of comparable algorithms for certain situations. CombiHeader is implemented as a pluggable module, which can be used with any existing redundancy elimination algorithm.

查看原文本刊更多论文

CombiHeader:最小化冗余消除系统中垫片头的数量

冗余消除已在许多地方用于提高网络性能。执行此操作的算法通常将数据分成块，对它们进行指纹识别，并将指纹与缓存进行比较，以识别相似的块。然后从数据中删除这些块，并插入标题代替它们。然而，这种方法给我们带来了两个关键的缺点。根据块的大小，要么需要插入许多头，要么增加丢失类似区域的概率。试图通过扩展块边界来克服遗漏的相似性检测的算法由于逐字节比较而遭受过多的内存访问。这种情况导致我们提出了一种新的算法，CombiHeader，它允许使用较小的块大小进行接近最大的相似性检测，同时使用块聚合技术以很少的内存访问传输很少的头。CombiHeader使用专门的有向图来跟踪和合并相邻的流行块。通过生成不同代的combinode, CombiHeader可以检测不同长度的相似区域，并使用尽可能少的头。实验表明，CombiHeader比一般消除算法使用不到25%的头，并且这个数字随着命中次数的增加而提高。在某些情况下，检测最大相似区域所需的内存访问在可比算法的1%-5%的范围内。CombiHeader实现为可插拔模块，可以与任何现有的冗余消除算法一起使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)

自引率

0.00%

发文量