Exploiting Intel optane persistent memory for full text search

Proceedings of the 2021 ACM SIGPLAN International Symposium on Memory Management Pub Date : 2021-06-22 DOI:10.1145/3459898.3463906

Shoaib Akram

{"title":"Exploiting Intel optane persistent memory for full text search","authors":"Shoaib Akram","doi":"10.1145/3459898.3463906","DOIUrl":null,"url":null,"abstract":"In our information-driven societies, full-text search is ubiquitous. Search is memory-intensive. Quickly searching massive corpora requires building indices, which consumes big volatile heaps. Search is storage I/O-intensive. Limited main memory necessitates writing large partial indices on non-volatile storage, where they finally live in merged form. These indices reside in memory, in full or in part, during query evaluation. Memory and I/O intensity make it hard to index and search content rapidly and efficiently. On the hardware side, the recently introduced Intel Optane DC persistent memory (PM) offers byte-addressability, high capacity, and non-volatility. This paper evaluates and exploits Optane PM for text indexing and search on multicore platforms. We identify essential structures in inverted indices (hash table, merge tree, and key-value store), where they reside (memory or storage), and key operations over them (sort, flush, and merge). We allocate index structures in DRAM, Optane PM, and block storage by modifying an existing search engine. We then evaluate a myriad of hybrid memory and storage configurations. Our findings include: (1) careful placement of index structures across DRAM, Optane PM, and SSD, speeds up indexing with a single core compared to a high-performance baseline, but does not scale to many cores, (2) crash-consistent indexing with Optane PM is feasible without incurring a high overhead, and (3) the tail latency of the longest multi-term conjunctive queries is lower with a PM-backed index than an SSD-backed one. This paper opens up persistent memory to a practical role in full-text search.","PeriodicalId":307528,"journal":{"name":"Proceedings of the 2021 ACM SIGPLAN International Symposium on Memory Management","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 ACM SIGPLAN International Symposium on Memory Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3459898.3463906","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

In our information-driven societies, full-text search is ubiquitous. Search is memory-intensive. Quickly searching massive corpora requires building indices, which consumes big volatile heaps. Search is storage I/O-intensive. Limited main memory necessitates writing large partial indices on non-volatile storage, where they finally live in merged form. These indices reside in memory, in full or in part, during query evaluation. Memory and I/O intensity make it hard to index and search content rapidly and efficiently. On the hardware side, the recently introduced Intel Optane DC persistent memory (PM) offers byte-addressability, high capacity, and non-volatility. This paper evaluates and exploits Optane PM for text indexing and search on multicore platforms. We identify essential structures in inverted indices (hash table, merge tree, and key-value store), where they reside (memory or storage), and key operations over them (sort, flush, and merge). We allocate index structures in DRAM, Optane PM, and block storage by modifying an existing search engine. We then evaluate a myriad of hybrid memory and storage configurations. Our findings include: (1) careful placement of index structures across DRAM, Optane PM, and SSD, speeds up indexing with a single core compared to a high-performance baseline, but does not scale to many cores, (2) crash-consistent indexing with Optane PM is feasible without incurring a high overhead, and (3) the tail latency of the longest multi-term conjunctive queries is lower with a PM-backed index than an SSD-backed one. This paper opens up persistent memory to a practical role in full-text search.

查看原文本刊更多论文

利用Intel optane持久内存进行全文搜索

在我们这个信息驱动的社会里，全文搜索无处不在。搜索是内存密集型的。快速搜索大量语料库需要建立索引，这会消耗大量的易失性堆。搜索是存储I/ o密集型的。有限的主存需要在非易失性存储上写入大的部分索引，它们最终以合并的形式存在。在查询求值期间，这些索引全部或部分驻留在内存中。内存和I/O强度使得难以快速有效地索引和搜索内容。在硬件方面，最近推出的英特尔Optane DC持久内存(PM)提供了字节寻址能力、高容量和非易失性。本文对Optane PM在多核平台上的文本索引和搜索进行了评估和开发。我们确定了倒排索引(哈希表、合并树和键值存储)中的基本结构、它们驻留的位置(内存或存储)以及对它们的键操作(排序、刷新和合并)。我们通过修改现有的搜索引擎在DRAM、Optane PM和块存储中分配索引结构。然后，我们评估了无数的混合内存和存储配置。我们的发现包括:(1)与高性能基准相比，在DRAM、Optane PM和SSD上仔细放置索引结构，可以加快单核索引的速度，但不能扩展到多个核心;(2)使用Optane PM进行崩溃一致索引是可行的，而不会产生高开销;(3)使用PM支持的索引进行最长的多词连接查询的尾部延迟比使用SSD支持的索引要低。这篇论文为持久记忆在全文检索中的实际作用打开了方便之门。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2021 ACM SIGPLAN International Symposium on Memory Management

自引率

0.00%

发文量