BloomStore: Bloom-Filter based memory-efficient key-value store for indexing of data deduplication on flash

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST) Pub Date : 2012-04-16 DOI:10.1109/MSST.2012.6232390

Guanlin Lu, Youngjin Nam, D. Du

{"title":"BloomStore: Bloom-Filter based memory-efficient key-value store for indexing of data deduplication on flash","authors":"Guanlin Lu, Youngjin Nam, D. Du","doi":"10.1109/MSST.2012.6232390","DOIUrl":null,"url":null,"abstract":"Due to its better scalability, Key-Value (KV) store has superseded traditional relational databases for many applications, such as data deduplication, on-line multi-player gaming, and Internet services like Amazon and Facebook. The KV store efficiently supports two operations (key lookup and KV pair insertion) through an index structure that maps keys to their associated values. The KV store is also commonly used to implement the chunk index in data deduplication, where a chunk ID (SHA1 value computed based on the chunk's content) is a key and its associative chunk metadata (e.g., physical storage location, stream ID) is the value. For a deduplication system, typically the number of chunks is too large to store the KV store solely in RAM. Thus, the KV store maintains a large (hash-table based) index structure in RAM to index all KV pairs stored on secondary storage. Hence, its available RAM space limits the maximum number of KV pairs that can be stored. Moving the index data structure from RAM to flash can possibly overcome the space limitation. In this paper, we propose efficient KV store on flash with a Bloom Filter based index structure called BloomStore. The unique features of the BloomStore include (1) no index structure is required to be stored in RAM so that a small RAM space can support a large number of KV pairs and (2) both index structure and KV pairs are stored compactly on flash memory to improve its performance. Compared with the state-of-the-art KV store designs, the BloomStore achieves a significantly better key lookup performance and roughly the same insertion performance with multiple times less RAM usage based on our experiments with deduplication workloads.","PeriodicalId":348234,"journal":{"name":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"66","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSST.2012.6232390","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 66

Abstract

Due to its better scalability, Key-Value (KV) store has superseded traditional relational databases for many applications, such as data deduplication, on-line multi-player gaming, and Internet services like Amazon and Facebook. The KV store efficiently supports two operations (key lookup and KV pair insertion) through an index structure that maps keys to their associated values. The KV store is also commonly used to implement the chunk index in data deduplication, where a chunk ID (SHA1 value computed based on the chunk's content) is a key and its associative chunk metadata (e.g., physical storage location, stream ID) is the value. For a deduplication system, typically the number of chunks is too large to store the KV store solely in RAM. Thus, the KV store maintains a large (hash-table based) index structure in RAM to index all KV pairs stored on secondary storage. Hence, its available RAM space limits the maximum number of KV pairs that can be stored. Moving the index data structure from RAM to flash can possibly overcome the space limitation. In this paper, we propose efficient KV store on flash with a Bloom Filter based index structure called BloomStore. The unique features of the BloomStore include (1) no index structure is required to be stored in RAM so that a small RAM space can support a large number of KV pairs and (2) both index structure and KV pairs are stored compactly on flash memory to improve its performance. Compared with the state-of-the-art KV store designs, the BloomStore achieves a significantly better key lookup performance and roughly the same insertion performance with multiple times less RAM usage based on our experiments with deduplication workloads.

查看原文本刊更多论文

BloomStore:基于bloomfilter的内存高效键值存储，用于对flash上的重复数据删除进行索引

由于具有更好的可伸缩性，键值(KV)存储已经在许多应用程序中取代了传统的关系数据库，例如重复数据删除、在线多人游戏以及Amazon和Facebook等Internet服务。KV存储通过一个索引结构有效地支持两种操作(键查找和KV对插入)，该索引结构将键映射到它们的关联值。KV存储也常用于实现重复数据删除中的块索引，其中块ID(基于块内容计算的SHA1值)是键，其关联的块元数据(如物理存储位置，流ID)是值。对于重复数据删除系统，通常块的数量太大，无法将KV存储单独存储在RAM中。因此，KV存储在RAM中维护一个大的(基于哈希表的)索引结构，以索引存储在二级存储上的所有KV对。因此，它的可用RAM空间限制了可以存储的KV对的最大数量。将索引数据结构从RAM移到闪存可能会克服空间限制。在本文中，我们提出了一种基于Bloom Filter的索引结构，称为BloomStore，用于flash上的高效KV存储。BloomStore的独特之处在于(1)不需要将索引结构存储在RAM中，使得较小的RAM空间可以支持大量的KV对;(2)索引结构和KV对都紧凑地存储在闪存中，从而提高了其性能。与最先进的KV存储设计相比，基于我们对重复数据删除工作负载的实验，BloomStore实现了更好的键查找性能和大致相同的插入性能，并且RAM使用量减少了几倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

自引率

0.00%

发文量