Alexandria: A Proof-of-Concept Implementation and Evaluation of Generalised Data Deduplication

2019 IEEE Globecom Workshops (GC Wkshps) Pub Date : 2019-12-01 DOI:10.1109/GCWkshps45667.2019.9024368

Lars Nielsen, Rasmus Vestergaard, N. Yazdani, Prasad Talasila, D. Lucani, M. Sipos

引用次数: 13

Abstract

The amount of data generated worldwide is expected to grow from 33 to 175 ZB by 2025 in part driven by the growth of Internet of Things (IoT) and cyber-physical systems (CPS). To cope with this enormous amount of data, new cloud storage techniques must be developed. Generalised Data Deduplication (GDD) is a new paradigm for reducing the cost of storage by systematically identifying near identical data chunks, storing their common component once, and a compact representation of the deviation to the original chunk for each chunk. This paper presents a system architecture for GDD and a proof-of-concept implementation. We evaluated the compression gain of Generalised Data Deduplication using three data sets of varying size and content and compared to the performance of the EXT4 and ZFS file systems, where the latter employs classic deduplication. We show that Generalised Data Deduplication provide up to 16.75% compression gain compared to both EXT4 and ZFS with data sets with less than 5 GB of data.

查看原文本刊更多论文

亚历山大:广义数据重复删除的概念验证实现和评估

到2025年，全球产生的数据量预计将从33 ZB增长到175 ZB，部分原因是物联网(IoT)和网络物理系统(CPS)的增长。为了处理如此庞大的数据量，必须开发新的云存储技术。广义数据重复删除(GDD)是一种新的范式，通过系统地识别几乎相同的数据块，存储它们的公共组件一次，并为每个块提供与原始块的偏差的紧凑表示，从而降低存储成本。本文提出了GDD的系统架构和概念验证实现。我们使用三个不同大小和内容的数据集评估了广义数据重复数据删除的压缩增益，并与EXT4和ZFS文件系统的性能进行了比较，后者采用了经典的重复数据删除。我们表明，对于数据集小于5 GB的数据集，与EXT4和ZFS相比，广义数据重复数据删除提供了高达16.75%的压缩增益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE Globecom Workshops (GC Wkshps)

自引率

0.00%

发文量