Alexandria: A Proof-of-Concept Implementation and Evaluation of Generalised Data Deduplication

Lars Nielsen, Rasmus Vestergaard, N. Yazdani, Prasad Talasila, D. Lucani, M. Sipos
{"title":"Alexandria: A Proof-of-Concept Implementation and Evaluation of Generalised Data Deduplication","authors":"Lars Nielsen, Rasmus Vestergaard, N. Yazdani, Prasad Talasila, D. Lucani, M. Sipos","doi":"10.1109/GCWkshps45667.2019.9024368","DOIUrl":null,"url":null,"abstract":"The amount of data generated worldwide is expected to grow from 33 to 175 ZB by 2025 in part driven by the growth of Internet of Things (IoT) and cyber-physical systems (CPS). To cope with this enormous amount of data, new cloud storage techniques must be developed. Generalised Data Deduplication (GDD) is a new paradigm for reducing the cost of storage by systematically identifying near identical data chunks, storing their common component once, and a compact representation of the deviation to the original chunk for each chunk. This paper presents a system architecture for GDD and a proof-of-concept implementation. We evaluated the compression gain of Generalised Data Deduplication using three data sets of varying size and content and compared to the performance of the EXT4 and ZFS file systems, where the latter employs classic deduplication. We show that Generalised Data Deduplication provide up to 16.75% compression gain compared to both EXT4 and ZFS with data sets with less than 5 GB of data.","PeriodicalId":210825,"journal":{"name":"2019 IEEE Globecom Workshops (GC Wkshps)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Globecom Workshops (GC Wkshps)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GCWkshps45667.2019.9024368","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

The amount of data generated worldwide is expected to grow from 33 to 175 ZB by 2025 in part driven by the growth of Internet of Things (IoT) and cyber-physical systems (CPS). To cope with this enormous amount of data, new cloud storage techniques must be developed. Generalised Data Deduplication (GDD) is a new paradigm for reducing the cost of storage by systematically identifying near identical data chunks, storing their common component once, and a compact representation of the deviation to the original chunk for each chunk. This paper presents a system architecture for GDD and a proof-of-concept implementation. We evaluated the compression gain of Generalised Data Deduplication using three data sets of varying size and content and compared to the performance of the EXT4 and ZFS file systems, where the latter employs classic deduplication. We show that Generalised Data Deduplication provide up to 16.75% compression gain compared to both EXT4 and ZFS with data sets with less than 5 GB of data.
亚历山大:广义数据重复删除的概念验证实现和评估
到2025年,全球产生的数据量预计将从33 ZB增长到175 ZB,部分原因是物联网(IoT)和网络物理系统(CPS)的增长。为了处理如此庞大的数据量,必须开发新的云存储技术。广义数据重复删除(GDD)是一种新的范式,通过系统地识别几乎相同的数据块,存储它们的公共组件一次,并为每个块提供与原始块的偏差的紧凑表示,从而降低存储成本。本文提出了GDD的系统架构和概念验证实现。我们使用三个不同大小和内容的数据集评估了广义数据重复数据删除的压缩增益,并与EXT4和ZFS文件系统的性能进行了比较,后者采用了经典的重复数据删除。我们表明,对于数据集小于5 GB的数据集,与EXT4和ZFS相比,广义数据重复数据删除提供了高达16.75%的压缩增益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信