{"title":"CombiHeader: Minimizing the number of shim headers in redundancy elimination systems","authors":"Sumanta Saha, Andrey Lukyanenko, Antti Yla-Jaaski","doi":"10.1109/INFCOMW.2011.5928920","DOIUrl":null,"url":null,"abstract":"Redundancy elimination has been used in many places to improve network performance. The algorithms for doing this typically split data into chunks, fingerprint them, and compare the fingerprint with cache to identify similar chunks. Then these chunks are removed from the data and headers are inserted instead of them. However, this approach presents us with two crucial shortcomings. Depending on the size of chunks, either many headers need to be inserted, or probability of missing similar regions is increased. Algorithms that try to overcome missed similarity detection by expanding chunk boundary suffers from excessive memory access due to byte-by-byte comparison. This situation leads us to propose a novel algorithm, CombiHeader, that allows near maximum similarity detection using smaller chunks sizes while using chunk aggregation technique to transmit very few headers with few memory accesses. CombiHeader uses a specialized directed graph to track and merge adjacent popular chunks. By generating different generations of CombiNodes, CombiHeader can detect different lengths of similarity region, and uses the smallest number of headers possible. Experiments show that CombiHeader uses less than 25% headers than general elimination algorithms, and this number improves with the number of hits. The required memory access to detect maximal similarity region is in the range of 1%-5% of comparable algorithms for certain situations. CombiHeader is implemented as a pluggable module, which can be used with any existing redundancy elimination algorithm.","PeriodicalId":402219,"journal":{"name":"2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFCOMW.2011.5928920","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
Redundancy elimination has been used in many places to improve network performance. The algorithms for doing this typically split data into chunks, fingerprint them, and compare the fingerprint with cache to identify similar chunks. Then these chunks are removed from the data and headers are inserted instead of them. However, this approach presents us with two crucial shortcomings. Depending on the size of chunks, either many headers need to be inserted, or probability of missing similar regions is increased. Algorithms that try to overcome missed similarity detection by expanding chunk boundary suffers from excessive memory access due to byte-by-byte comparison. This situation leads us to propose a novel algorithm, CombiHeader, that allows near maximum similarity detection using smaller chunks sizes while using chunk aggregation technique to transmit very few headers with few memory accesses. CombiHeader uses a specialized directed graph to track and merge adjacent popular chunks. By generating different generations of CombiNodes, CombiHeader can detect different lengths of similarity region, and uses the smallest number of headers possible. Experiments show that CombiHeader uses less than 25% headers than general elimination algorithms, and this number improves with the number of hits. The required memory access to detect maximal similarity region is in the range of 1%-5% of comparable algorithms for certain situations. CombiHeader is implemented as a pluggable module, which can be used with any existing redundancy elimination algorithm.