SAFE: Structure-aware file and email deduplication for cloud-based storage systems

Daehee Kim, Sejun Song, Baek-Young Choi
{"title":"SAFE: Structure-aware file and email deduplication for cloud-based storage systems","authors":"Daehee Kim, Sejun Song, Baek-Young Choi","doi":"10.1109/CloudNet.2013.6710567","DOIUrl":null,"url":null,"abstract":"Cloud-based storages have become considerably popular in recent years, as they enable data access from anywhere and any device at any time. Many leading cloud-based storage services including Dropbox, JustCloud, and Mozy use data deduplication techniques at a source to save network bandwidth from a user to cloud servers as well as storage space, which in turn expedites the speed of data upload. Although traditional variable-size block-level deduplication techniques tend to achieve a high data reduction rate, they require a high processing overhead due to data chunking, index processing, and data fragmentation. However, a user's device can be limited in processing capability and memory space to perform an effective client side deduplication. While, a simple file-level or a large fixed-size block-level deduplication may be able to cope with the limited source device capacity, it cannot produce a high data reduction rate. In this paper, we propose a novel Structure-Aware File and Email deduplication (SAFE) scheme that achieves both fast and effective data reduction for cloud-based storage services. SAFE efficiently deduplicates redundant objects in structured files as well as emails exploiting object-level components based on their structures. Our evaluation using real data sets of structured files and emails shows that SAFE accomplishes as good of storage savings as a variable-block deduplication, while being as fast as a file-level or a large fixed-size block-level deduplication.","PeriodicalId":262262,"journal":{"name":"2013 IEEE 2nd International Conference on Cloud Networking (CloudNet)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 2nd International Conference on Cloud Networking (CloudNet)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CloudNet.2013.6710567","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

Cloud-based storages have become considerably popular in recent years, as they enable data access from anywhere and any device at any time. Many leading cloud-based storage services including Dropbox, JustCloud, and Mozy use data deduplication techniques at a source to save network bandwidth from a user to cloud servers as well as storage space, which in turn expedites the speed of data upload. Although traditional variable-size block-level deduplication techniques tend to achieve a high data reduction rate, they require a high processing overhead due to data chunking, index processing, and data fragmentation. However, a user's device can be limited in processing capability and memory space to perform an effective client side deduplication. While, a simple file-level or a large fixed-size block-level deduplication may be able to cope with the limited source device capacity, it cannot produce a high data reduction rate. In this paper, we propose a novel Structure-Aware File and Email deduplication (SAFE) scheme that achieves both fast and effective data reduction for cloud-based storage services. SAFE efficiently deduplicates redundant objects in structured files as well as emails exploiting object-level components based on their structures. Our evaluation using real data sets of structured files and emails shows that SAFE accomplishes as good of storage savings as a variable-block deduplication, while being as fast as a file-level or a large fixed-size block-level deduplication.
SAFE:基于云存储系统的结构感知文件和邮件重复数据删除
近年来,基于云的存储变得相当流行,因为它们可以随时随地从任何设备访问数据。包括Dropbox、JustCloud和Mozy在内的许多领先的基于云的存储服务都在源头上使用重复数据删除技术,以节省从用户到云服务器的网络带宽以及存储空间,从而加快数据上传的速度。尽管传统的变大小块级重复数据删除技术倾向于实现高数据减少率,但由于数据分块、索引处理和数据碎片,它们需要很高的处理开销。但是,用户设备的处理能力和内存空间可能受到限制,无法执行有效的客户端重复数据删除。简单的文件级重复数据删除或大的固定大小的块级重复数据删除虽然可以应付有限的源设备容量,但不能产生高的数据缩减率。在本文中,我们提出了一种新的结构感知文件和电子邮件重复数据删除(SAFE)方案,该方案为基于云的存储服务实现了快速有效的数据缩减。SAFE有效地删除结构化文件中的冗余对象,以及基于其结构利用对象级组件的电子邮件。我们使用结构化文件和电子邮件的真实数据集进行的评估表明,SAFE实现了与可变块重复数据删除一样好的存储节省,同时与文件级或大型固定大小块级重复数据删除一样快。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信