One, Two, Hash! Counting Hash Tables for Flash Devices

Tyler Clemons, S. M. Faisal, S. Tatikonda, C. Aggarwal, S. Parthasarathy
{"title":"One, Two, Hash! Counting Hash Tables for Flash Devices","authors":"Tyler Clemons, S. M. Faisal, S. Tatikonda, C. Aggarwal, S. Parthasarathy","doi":"10.1145/2567688.2567693","DOIUrl":null,"url":null,"abstract":"In recent years, advances in hardware technology have led to the increasingly wide spread use of flash storage devices. Such devices have clear benefits over traditional hard drives in terms of latency of access, bandwidth, and random access capabilities particularly when reading data. However, there are some interesting tradeoffs. On a relative scale, writing to such devices can be expensive. This is because typical flash devices (NAND technology) are updated in blocks. A minor update to a given block requires the entire block to be erased, also referred to as cleaned, followed by a re-writing of the block. On the other hand, sequential writes can be two orders of magnitude faster than random writes. In addition, random writes are degrading to the life of the flash drive because each block can support only a limited number of cleaning operations. Hash tables are a particularly challenging case for the flash drive because this data structure is inherently dependent upon the randomness of the hash function, as opposed to the spatial locality of the data. Thus it is difficult to avoid random writes. In this paper, we will study the design landscape for the development of a hash table for flash storage devices. We demonstrate design tradeoffs with the design of a hash table by using two related hash functions, one of which exhibits a data placement property with respect to the other. Specifically, we focus on three designs based on this general philosophy and evaluate the trade-offs among them along the axes of query performance, insert and update times, and I/O time.","PeriodicalId":253386,"journal":{"name":"Proceedings of the 1st IKDD Conference on Data Sciences","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st IKDD Conference on Data Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2567688.2567693","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, advances in hardware technology have led to the increasingly wide spread use of flash storage devices. Such devices have clear benefits over traditional hard drives in terms of latency of access, bandwidth, and random access capabilities particularly when reading data. However, there are some interesting tradeoffs. On a relative scale, writing to such devices can be expensive. This is because typical flash devices (NAND technology) are updated in blocks. A minor update to a given block requires the entire block to be erased, also referred to as cleaned, followed by a re-writing of the block. On the other hand, sequential writes can be two orders of magnitude faster than random writes. In addition, random writes are degrading to the life of the flash drive because each block can support only a limited number of cleaning operations. Hash tables are a particularly challenging case for the flash drive because this data structure is inherently dependent upon the randomness of the hash function, as opposed to the spatial locality of the data. Thus it is difficult to avoid random writes. In this paper, we will study the design landscape for the development of a hash table for flash storage devices. We demonstrate design tradeoffs with the design of a hash table by using two related hash functions, one of which exhibits a data placement property with respect to the other. Specifically, we focus on three designs based on this general philosophy and evaluate the trade-offs among them along the axes of query performance, insert and update times, and I/O time.
一,二,哈希!计算Flash设备的哈希表
近年来,硬件技术的进步使得闪存存储设备的应用越来越广泛。这种设备在访问延迟、带宽和随机访问能力(特别是在读取数据时)方面明显优于传统硬盘驱动器。然而,有一些有趣的权衡。相对而言,向此类设备写入数据的成本可能很高。这是因为典型的闪存设备(NAND技术)是以块为单位更新的。对给定块的小更新需要擦除整个块,也称为清理,然后重新写入块。另一方面,顺序写可以比随机写快两个数量级。此外,随机写入会降低闪存驱动器的寿命,因为每个块只能支持有限数量的清理操作。对于闪存驱动器来说,哈希表是一个特别具有挑战性的情况,因为这种数据结构本质上依赖于哈希函数的随机性,而不是数据的空间局部性。因此,很难避免随机写入。在本文中,我们将研究用于闪存设备的哈希表开发的设计景观。我们通过使用两个相关的哈希函数来演示哈希表设计的设计权衡,其中一个显示了相对于另一个的数据放置属性。具体来说,我们将关注基于这一一般理念的三种设计,并沿着查询性能、插入和更新时间以及I/O时间的轴来评估它们之间的权衡。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信