{"title":"Search-in-Memory: Reliable, Versatile, and Efficient Data Matching in SSD’s NAND Flash Memory Chip for Data Indexing Acceleration","authors":"Yun-Chih Chen;Yuan-Hao Chang;Tei-Wei Kuo","doi":"10.1109/TCAD.2024.3443702","DOIUrl":null,"url":null,"abstract":"To index the increasing volume of data, modern data indexes are typically stored on solid-state drives and cached in DRAM. However, searching such an index has resulted in significant I/O traffic due to limited access locality and inefficient cache utilization. At the heart of index searching is the operation of filtering through vast data spans to isolate a small, relevant subset, which involves basic equality tests rather than the complex arithmetic provided by modern CPUs. This article demonstrates the feasibility of performing data filtering directly within a NAND flash memory chip, transmitting only relevant search results rather than complete pages. Instead of adding complex circuits, we propose repurposing existing circuitry for efficient and accurate bitwise parallel matching. We demonstrate how different data structures can use our flexible SIMD command interface to offload index searches. This strategy not only frees up the CPU for more computationally demanding tasks, but it also optimizes DRAM usage for write buffering, significantly lowering energy consumption associated with I/O transmission between the CPU and DRAM. Extensive testing across a wide range of workloads reveals up to a \n<inline-formula> <tex-math>$9\\times $ </tex-math></inline-formula>\n speedup in write-heavy workloads and up to 45% energy savings due to reduced read and write I/O. Furthermore, we achieve significant reductions in median and tail read latencies of up to 89% and 85%, respectively.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3864-3875"},"PeriodicalIF":2.7000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10745821/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
To index the increasing volume of data, modern data indexes are typically stored on solid-state drives and cached in DRAM. However, searching such an index has resulted in significant I/O traffic due to limited access locality and inefficient cache utilization. At the heart of index searching is the operation of filtering through vast data spans to isolate a small, relevant subset, which involves basic equality tests rather than the complex arithmetic provided by modern CPUs. This article demonstrates the feasibility of performing data filtering directly within a NAND flash memory chip, transmitting only relevant search results rather than complete pages. Instead of adding complex circuits, we propose repurposing existing circuitry for efficient and accurate bitwise parallel matching. We demonstrate how different data structures can use our flexible SIMD command interface to offload index searches. This strategy not only frees up the CPU for more computationally demanding tasks, but it also optimizes DRAM usage for write buffering, significantly lowering energy consumption associated with I/O transmission between the CPU and DRAM. Extensive testing across a wide range of workloads reveals up to a
$9\times $
speedup in write-heavy workloads and up to 45% energy savings due to reduced read and write I/O. Furthermore, we achieve significant reductions in median and tail read latencies of up to 89% and 85%, respectively.
为了给日益增长的数据量编制索引,现代数据索引通常存储在固态硬盘上,并缓存在 DRAM 中。然而,由于访问位置有限和缓存利用效率低下,搜索这样的索引会产生大量的 I/O 流量。索引搜索的核心是在庞大的数据跨度中进行过滤,以分离出一小部分相关子集,这涉及到基本的相等测试,而不是现代 CPU 所提供的复杂运算。本文展示了在 NAND 闪存芯片中直接执行数据过滤的可行性,只传输相关的搜索结果而不是完整的页面。我们建议重新利用现有电路进行高效、准确的位并行匹配,而不是增加复杂的电路。我们演示了不同的数据结构如何利用我们灵活的 SIMD 命令接口来卸载索引搜索。这种策略不仅能将 CPU 解放出来,用于计算要求更高的任务,还能优化 DRAM 在写缓冲方面的使用,从而显著降低 CPU 和 DRAM 之间 I/O 传输的能耗。在广泛的工作负载中进行的广泛测试表明,由于减少了读写 I/O,写入量大的工作负载速度提高了 9 倍,能耗降低了 45%。此外,我们还显著降低了中位和尾部读取延迟,分别高达 89% 和 85%。
期刊介绍:
The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.