CABdedupe: A Causality-Based Deduplication Performance Booster for Cloud Backup Services

2011 IEEE International Parallel & Distributed Processing Symposium Pub Date : 2011-05-16 DOI:10.1109/IPDPS.2011.76

Yujuan Tan, Hong Jiang, D. Feng, Lei Tian, Zhichao Yan

{"title":"CABdedupe: A Causality-Based Deduplication Performance Booster for Cloud Backup Services","authors":"Yujuan Tan, Hong Jiang, D. Feng, Lei Tian, Zhichao Yan","doi":"10.1109/IPDPS.2011.76","DOIUrl":null,"url":null,"abstract":"Due to the relatively low bandwidth of WAN (Wide Area Network) that supports cloud backup services, both the backup time and restore time in the cloud backup environment are in desperate need for reduction to make cloud backup a practical and affordable service for small businesses and telecommuters alike. Existing solutions that employ the deduplication technology for cloud backup services only focus on removing redundant data from transmission during backup operations to reduce the backup time, while paying little attention to the restore time that we argue is an important aspect and affects the overall quality of service of the cloud backup services. In this paper, we propose a \\emph{CAusality-Based deduplication performance booster for both cloud backup and restore operations}, called CABdedupe, which captures the causal relationship among chronological versions of datasets that are processed in multiple backups/restores, to remove the unmodified data from transmission during not only backup operations but also restore operations, thus to improve both the backup and restore performances. CABdedupe is a middleware that is orthogonal to and can be integrated into any existing backup system. Our extensive experiments, where we integrate CABdedupe into two existing backup systems and feed real world datasets, show that both the backup time and restore time are significantly reduced, with a reduction ratio of up to $103:1$.","PeriodicalId":355100,"journal":{"name":"2011 IEEE International Parallel & Distributed Processing Symposium","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"64","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Parallel & Distributed Processing Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2011.76","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 64

Abstract

Due to the relatively low bandwidth of WAN (Wide Area Network) that supports cloud backup services, both the backup time and restore time in the cloud backup environment are in desperate need for reduction to make cloud backup a practical and affordable service for small businesses and telecommuters alike. Existing solutions that employ the deduplication technology for cloud backup services only focus on removing redundant data from transmission during backup operations to reduce the backup time, while paying little attention to the restore time that we argue is an important aspect and affects the overall quality of service of the cloud backup services. In this paper, we propose a \emph{CAusality-Based deduplication performance booster for both cloud backup and restore operations}, called CABdedupe, which captures the causal relationship among chronological versions of datasets that are processed in multiple backups/restores, to remove the unmodified data from transmission during not only backup operations but also restore operations, thus to improve both the backup and restore performances. CABdedupe is a middleware that is orthogonal to and can be integrated into any existing backup system. Our extensive experiments, where we integrate CABdedupe into two existing backup systems and feed real world datasets, show that both the backup time and restore time are significantly reduced, with a reduction ratio of up to $103:1$.

查看原文本刊更多论文

CABdedupe:基于因果关系的重复数据删除性能助推器

由于支持云备份服务的WAN(广域网)带宽相对较低，因此迫切需要减少云备份环境中的备份时间和恢复时间，以使云备份成为小型企业和远程办公人员都能负担得起的实用服务。现有的云备份解决方案采用重复数据删除技术，只注重在备份过程中删除传输中的冗余数据，以减少备份时间，而很少关注恢复时间，我们认为恢复时间是影响云备份服务整体质量的一个重要方面。在本文中，我们\emph{为云备份和恢复操作提出了一种基于因果关系的重复数据删除性能增强器}，称为CABdedupe，它捕获在多个备份/恢复中处理的数据集的时间顺序版本之间的因果关系，以便在备份操作和恢复操作期间从传输中删除未修改的数据，从而提高备份和恢复性能。CABdedupe是一种中间件，它与任何现有的备份系统都是正交的，并且可以集成到其中。我们进行了大量的实验，将CABdedupe集成到两个现有的备份系统中，并提供真实世界的数据集，结果表明，备份时间和恢复时间都大大减少了，减少率高达$103:1$。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE International Parallel & Distributed Processing Symposium

自引率

0.00%

发文量