One, Two, Hash! Counting Hash Tables for Flash Devices

Proceedings of the 1st IKDD Conference on Data Sciences Pub Date : 2014-03-21 DOI:10.1145/2567688.2567693

Tyler Clemons, S. M. Faisal, S. Tatikonda, C. Aggarwal, S. Parthasarathy

{"title":"One, Two, Hash! Counting Hash Tables for Flash Devices","authors":"Tyler Clemons, S. M. Faisal, S. Tatikonda, C. Aggarwal, S. Parthasarathy","doi":"10.1145/2567688.2567693","DOIUrl":null,"url":null,"abstract":"In recent years, advances in hardware technology have led to the increasingly wide spread use of flash storage devices. Such devices have clear benefits over traditional hard drives in terms of latency of access, bandwidth, and random access capabilities particularly when reading data. However, there are some interesting tradeoffs. On a relative scale, writing to such devices can be expensive. This is because typical flash devices (NAND technology) are updated in blocks. A minor update to a given block requires the entire block to be erased, also referred to as cleaned, followed by a re-writing of the block. On the other hand, sequential writes can be two orders of magnitude faster than random writes. In addition, random writes are degrading to the life of the flash drive because each block can support only a limited number of cleaning operations. Hash tables are a particularly challenging case for the flash drive because this data structure is inherently dependent upon the randomness of the hash function, as opposed to the spatial locality of the data. Thus it is difficult to avoid random writes. In this paper, we will study the design landscape for the development of a hash table for flash storage devices. We demonstrate design tradeoffs with the design of a hash table by using two related hash functions, one of which exhibits a data placement property with respect to the other. Specifically, we focus on three designs based on this general philosophy and evaluate the trade-offs among them along the axes of query performance, insert and update times, and I/O time.","PeriodicalId":253386,"journal":{"name":"Proceedings of the 1st IKDD Conference on Data Sciences","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st IKDD Conference on Data Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2567688.2567693","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, advances in hardware technology have led to the increasingly wide spread use of flash storage devices. Such devices have clear benefits over traditional hard drives in terms of latency of access, bandwidth, and random access capabilities particularly when reading data. However, there are some interesting tradeoffs. On a relative scale, writing to such devices can be expensive. This is because typical flash devices (NAND technology) are updated in blocks. A minor update to a given block requires the entire block to be erased, also referred to as cleaned, followed by a re-writing of the block. On the other hand, sequential writes can be two orders of magnitude faster than random writes. In addition, random writes are degrading to the life of the flash drive because each block can support only a limited number of cleaning operations. Hash tables are a particularly challenging case for the flash drive because this data structure is inherently dependent upon the randomness of the hash function, as opposed to the spatial locality of the data. Thus it is difficult to avoid random writes. In this paper, we will study the design landscape for the development of a hash table for flash storage devices. We demonstrate design tradeoffs with the design of a hash table by using two related hash functions, one of which exhibits a data placement property with respect to the other. Specifically, we focus on three designs based on this general philosophy and evaluate the trade-offs among them along the axes of query performance, insert and update times, and I/O time.

查看原文本刊更多论文

一，二，哈希!计算Flash设备的哈希表

近年来，硬件技术的进步使得闪存存储设备的应用越来越广泛。这种设备在访问延迟、带宽和随机访问能力(特别是在读取数据时)方面明显优于传统硬盘驱动器。然而，有一些有趣的权衡。相对而言，向此类设备写入数据的成本可能很高。这是因为典型的闪存设备(NAND技术)是以块为单位更新的。对给定块的小更新需要擦除整个块，也称为清理，然后重新写入块。另一方面，顺序写可以比随机写快两个数量级。此外，随机写入会降低闪存驱动器的寿命，因为每个块只能支持有限数量的清理操作。对于闪存驱动器来说，哈希表是一个特别具有挑战性的情况，因为这种数据结构本质上依赖于哈希函数的随机性，而不是数据的空间局部性。因此，很难避免随机写入。在本文中，我们将研究用于闪存设备的哈希表开发的设计景观。我们通过使用两个相关的哈希函数来演示哈希表设计的设计权衡，其中一个显示了相对于另一个的数据放置属性。具体来说，我们将关注基于这一一般理念的三种设计，并沿着查询性能、插入和更新时间以及I/O时间的轴来评估它们之间的权衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 1st IKDD Conference on Data Sciences

自引率

0.00%

发文量