{"title":"云存储快速变粒度相似性重复数据删除","authors":"Xuming Ye, Jia Tang, Wenlong Tian, Ruixuan Li, Weijun Xiao, Yuqing Geng, Zhiyong Xu","doi":"10.1109/nas51552.2021.9605398","DOIUrl":null,"url":null,"abstract":"With the prevalence of cloud storage, data deduplication has been a widely used technology by removing cross users’ duplicate data and saving network bandwidth. Nevertheless, traditional data deduplication hardly detects duplicate data among resemblance chunks. Currently, a resemblance data deduplication, called Finesse, has been proposed to detect and remove the duplicate data among similar chunks efficiently. However, we observe that the chunks following the similar chunk have a high chance of resembling data locality property, and vice versa. Processing these adjacent similar chunks in small average chunk size level increases the metadata, which deteriorates the deduplication system performance. Moreover, existing resemblance data deduplication schemes ignore the performance impact from metadata. Therefore, we propose a fast variable-grained resemblance data deduplication for cloud storage. It dynamically combines the adjacent resemblance chunks or unique chunks or breaks those chunks, located at the transition region between resemblance chunks and unique chunks. Finally, we implement a prototype and conduct a serial of experiments on real-world datasets. The results show that our method dramatically reduces the metadata size while achieving the high deduplication ratio.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Fast Variable-Grained Resemblance Data Deduplication For Cloud Storage\",\"authors\":\"Xuming Ye, Jia Tang, Wenlong Tian, Ruixuan Li, Weijun Xiao, Yuqing Geng, Zhiyong Xu\",\"doi\":\"10.1109/nas51552.2021.9605398\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the prevalence of cloud storage, data deduplication has been a widely used technology by removing cross users’ duplicate data and saving network bandwidth. Nevertheless, traditional data deduplication hardly detects duplicate data among resemblance chunks. Currently, a resemblance data deduplication, called Finesse, has been proposed to detect and remove the duplicate data among similar chunks efficiently. However, we observe that the chunks following the similar chunk have a high chance of resembling data locality property, and vice versa. Processing these adjacent similar chunks in small average chunk size level increases the metadata, which deteriorates the deduplication system performance. Moreover, existing resemblance data deduplication schemes ignore the performance impact from metadata. Therefore, we propose a fast variable-grained resemblance data deduplication for cloud storage. It dynamically combines the adjacent resemblance chunks or unique chunks or breaks those chunks, located at the transition region between resemblance chunks and unique chunks. Finally, we implement a prototype and conduct a serial of experiments on real-world datasets. The results show that our method dramatically reduces the metadata size while achieving the high deduplication ratio.\",\"PeriodicalId\":135930,\"journal\":{\"name\":\"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/nas51552.2021.9605398\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/nas51552.2021.9605398","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Fast Variable-Grained Resemblance Data Deduplication For Cloud Storage
With the prevalence of cloud storage, data deduplication has been a widely used technology by removing cross users’ duplicate data and saving network bandwidth. Nevertheless, traditional data deduplication hardly detects duplicate data among resemblance chunks. Currently, a resemblance data deduplication, called Finesse, has been proposed to detect and remove the duplicate data among similar chunks efficiently. However, we observe that the chunks following the similar chunk have a high chance of resembling data locality property, and vice versa. Processing these adjacent similar chunks in small average chunk size level increases the metadata, which deteriorates the deduplication system performance. Moreover, existing resemblance data deduplication schemes ignore the performance impact from metadata. Therefore, we propose a fast variable-grained resemblance data deduplication for cloud storage. It dynamically combines the adjacent resemblance chunks or unique chunks or breaks those chunks, located at the transition region between resemblance chunks and unique chunks. Finally, we implement a prototype and conduct a serial of experiments on real-world datasets. The results show that our method dramatically reduces the metadata size while achieving the high deduplication ratio.