Design of an exact data deduplication cluster

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST) Pub Date : 2012-04-16 DOI:10.1109/MSST.2012.6232380

J. Kaiser, Dirk Meister, A. Brinkmann, S. Effert

引用次数: 33

Abstract

Data deduplication is an important component of enterprise storage environments. The throughput and capacity limitations of single node solutions have led to the development of clustered deduplication systems. Most implemented clustered inline solutions are trading deduplication ratio versus performance and are willing to miss opportunities to detect redundant data, which a single node system would detect. We present an inline deduplication cluster with a joint distributed chunk index, which is able to detect as much redundancy as a single node solution. The use of locality and load balancing paradigms enables the nodes to minimize information exchange. Therefore, we are able to show that, despite different claims in previous papers, it is possible to combine exact deduplication, small chunk sizes, and scalability within one environment using only a commodity GBit Ethernet interconnect. Additionally, we investigate the throughput and scalability limitations with a special focus on the intra-node communication.

查看原文本刊更多论文

设计精确的重复数据删除集群

重复数据删除是企业存储环境的重要组成部分。单节点解决方案的吞吐量和容量限制导致了集群重复数据删除系统的发展。大多数实现的集群内联解决方案都在重复数据删除比率与性能之间进行权衡，并且愿意错过检测冗余数据的机会，而单节点系统可以检测到冗余数据。我们提出了一个具有联合分布式块索引的内联重复数据删除集群，它能够检测到与单节点解决方案一样多的冗余。局部性和负载平衡范例的使用使节点能够最大限度地减少信息交换。因此，我们能够证明，尽管在以前的论文中有不同的主张，但在一个环境中，仅使用商品gb以太网互连就可以结合精确的重复数据删除、小块大小和可伸缩性。此外，我们研究了吞吐量和可扩展性限制，特别关注节点内通信。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

自引率

0.00%

发文量