DottedDB: Anti-Entropy without Merkle Trees, Deletes without Tombstones

2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS) Pub Date : 2017-09-01 DOI:10.1109/SRDS.2017.28

Ricardo Gonçalves, Paulo Sérgio Almeida, Carlos Baquero, V. Fonte

{"title":"DottedDB: Anti-Entropy without Merkle Trees, Deletes without Tombstones","authors":"Ricardo Gonçalves, Paulo Sérgio Almeida, Carlos Baquero, V. Fonte","doi":"10.1109/SRDS.2017.28","DOIUrl":null,"url":null,"abstract":"To achieve high availability in the face of network partitions, many distributed databases adopt eventual consistency, allow temporary conflicts due to concurrent writes, and use some form of per-key logical clock to detect and resolve such conflicts. Furthermore, nodes synchronize periodically to ensure replica convergence in a process called anti-entropy, normally using Merkle Trees. We present the design of DottedDB, a Dynamo-like key-value store, which uses a novel node-wide logical clock framework, overcoming three fundamental limitations of the state of the art: (1) minimize the metadata per key necessary to track causality, avoiding its growth even in the face of node churn; (2) correctly and durably delete keys, with no need for tombstones; (3) offer a lightweight anti-entropy mechanism to converge replicated data, avoiding the need for Merkle Trees. We evaluate DottedDB against MerkleDB, an otherwise identical database, but using per-key logical clocks and Merkle Trees for anti-entropy, to precisely measure the impact of the novel approach. Results show that: causality metadata per object always converges rapidly to only one id-counter pair; distributed deletes are correctly achieved without global coordination and with constant metadata; divergent nodes are synchronized faster, with less memory-footprint and with less communication overhead than using Merkle Trees.","PeriodicalId":6475,"journal":{"name":"2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS)","volume":"35 1","pages":"194-203"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SRDS.2017.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

To achieve high availability in the face of network partitions, many distributed databases adopt eventual consistency, allow temporary conflicts due to concurrent writes, and use some form of per-key logical clock to detect and resolve such conflicts. Furthermore, nodes synchronize periodically to ensure replica convergence in a process called anti-entropy, normally using Merkle Trees. We present the design of DottedDB, a Dynamo-like key-value store, which uses a novel node-wide logical clock framework, overcoming three fundamental limitations of the state of the art: (1) minimize the metadata per key necessary to track causality, avoiding its growth even in the face of node churn; (2) correctly and durably delete keys, with no need for tombstones; (3) offer a lightweight anti-entropy mechanism to converge replicated data, avoiding the need for Merkle Trees. We evaluate DottedDB against MerkleDB, an otherwise identical database, but using per-key logical clocks and Merkle Trees for anti-entropy, to precisely measure the impact of the novel approach. Results show that: causality metadata per object always converges rapidly to only one id-counter pair; distributed deletes are correctly achieved without global coordination and with constant metadata; divergent nodes are synchronized faster, with less memory-footprint and with less communication overhead than using Merkle Trees.

查看原文本刊更多论文

DottedDB:反熵没有默克尔树，删除没有墓碑

为了在面对网络分区时实现高可用性，许多分布式数据库采用最终一致性，允许由于并发写而产生的临时冲突，并使用某种形式的每键逻辑时钟来检测和解决此类冲突。此外，节点定期同步以确保副本在称为反熵的过程中收敛，通常使用默克尔树。我们提出了一种类似dynamo的键值存储DottedDB的设计，它使用了一种新颖的节点范围内的逻辑时钟框架，克服了现有技术的三个基本限制:(1)最小化跟踪因果关系所需的每个键的元数据，即使面对节点的混乱也避免了元数据的增长;(2)正确持久地删除密钥，不需要墓碑;(3)提供轻量级的反熵机制来收敛复制数据，避免了对Merkle树的需要。我们根据MerkleDB(另一个相同的数据库)对dottedb进行评估，但使用每个键逻辑时钟和Merkle树进行反熵，以精确测量新方法的影响。结果表明:每个对象的因果关系元数据总是快速收敛到只有一个id-counter对;分布式删除可以在没有全局协调和恒定元数据的情况下正确实现;与使用Merkle树相比，不同节点的同步速度更快，内存占用更少，通信开销更少。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS)

自引率

0.00%

发文量