Ricardo Gonçalves, Paulo Sérgio Almeida, Carlos Baquero, V. Fonte
{"title":"DottedDB: Anti-Entropy without Merkle Trees, Deletes without Tombstones","authors":"Ricardo Gonçalves, Paulo Sérgio Almeida, Carlos Baquero, V. Fonte","doi":"10.1109/SRDS.2017.28","DOIUrl":null,"url":null,"abstract":"To achieve high availability in the face of network partitions, many distributed databases adopt eventual consistency, allow temporary conflicts due to concurrent writes, and use some form of per-key logical clock to detect and resolve such conflicts. Furthermore, nodes synchronize periodically to ensure replica convergence in a process called anti-entropy, normally using Merkle Trees. We present the design of DottedDB, a Dynamo-like key-value store, which uses a novel node-wide logical clock framework, overcoming three fundamental limitations of the state of the art: (1) minimize the metadata per key necessary to track causality, avoiding its growth even in the face of node churn; (2) correctly and durably delete keys, with no need for tombstones; (3) offer a lightweight anti-entropy mechanism to converge replicated data, avoiding the need for Merkle Trees. We evaluate DottedDB against MerkleDB, an otherwise identical database, but using per-key logical clocks and Merkle Trees for anti-entropy, to precisely measure the impact of the novel approach. Results show that: causality metadata per object always converges rapidly to only one id-counter pair; distributed deletes are correctly achieved without global coordination and with constant metadata; divergent nodes are synchronized faster, with less memory-footprint and with less communication overhead than using Merkle Trees.","PeriodicalId":6475,"journal":{"name":"2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS)","volume":"35 1","pages":"194-203"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SRDS.2017.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
To achieve high availability in the face of network partitions, many distributed databases adopt eventual consistency, allow temporary conflicts due to concurrent writes, and use some form of per-key logical clock to detect and resolve such conflicts. Furthermore, nodes synchronize periodically to ensure replica convergence in a process called anti-entropy, normally using Merkle Trees. We present the design of DottedDB, a Dynamo-like key-value store, which uses a novel node-wide logical clock framework, overcoming three fundamental limitations of the state of the art: (1) minimize the metadata per key necessary to track causality, avoiding its growth even in the face of node churn; (2) correctly and durably delete keys, with no need for tombstones; (3) offer a lightweight anti-entropy mechanism to converge replicated data, avoiding the need for Merkle Trees. We evaluate DottedDB against MerkleDB, an otherwise identical database, but using per-key logical clocks and Merkle Trees for anti-entropy, to precisely measure the impact of the novel approach. Results show that: causality metadata per object always converges rapidly to only one id-counter pair; distributed deletes are correctly achieved without global coordination and with constant metadata; divergent nodes are synchronized faster, with less memory-footprint and with less communication overhead than using Merkle Trees.