SILK+ Preventing Latency Spikes in Log-Structured Merge Key-Value Stores Running Heterogeneous Workloads

ACM Transactions on Computer Systems (TOCS) Pub Date : 2020-05-30 DOI:10.1145/3380905

Oana Balmau, Florin Dinu, W. Zwaenepoel, Karan Gupta, Ravishankar Chandhiramoorthi, Diego Didona

{"title":"SILK+ Preventing Latency Spikes in Log-Structured Merge Key-Value Stores Running Heterogeneous Workloads","authors":"Oana Balmau, Florin Dinu, W. Zwaenepoel, Karan Gupta, Ravishankar Chandhiramoorthi, Diego Didona","doi":"10.1145/3380905","DOIUrl":null,"url":null,"abstract":"Log-Structured Merge Key-Value stores (LSM KVs) are designed to offer good write performance, by capturing client writes in memory, and only later flushing them to storage. Writes are later compacted into a tree-like data structure on disk to improve read performance and to reduce storage space use. It has been widely documented that compactions severely hamper throughput. Various optimizations have successfully dealt with this problem. These techniques include, among others, rate-limiting flushes and compactions, selecting among compactions for maximum effect, and limiting compactions to the highest level by so-called fragmented LSMs. In this article, we focus on latencies rather than throughput. We first document the fact that LSM KVs exhibit high tail latencies. The techniques that have been proposed for optimizing throughput do not address this issue, and, in fact, in some cases, exacerbate it. The root cause of these high tail latencies is interference between client writes, flushes, and compactions. Another major cause for tail latency is the heterogeneous nature of the workloads in terms of operation mix and item sizes whereby a few more computationally heavy requests slow down the vast majority of smaller requests. We introduce the notion of an Input/Output (I/O) bandwidth scheduler for an LSM-based KV store to reduce tail latency caused by interference of flushing and compactions and by workload heterogeneity. We explore three techniques as part of this I/O scheduler: (1) opportunistically allocating more bandwidth to internal operations during periods of low load, (2) prioritizing flushes and compactions at the lower levels of the tree, and (3) separating client requests by size and by data access path. SILK+ is a new open-source LSM KV that incorporates this notion of an I/O scheduler.","PeriodicalId":318554,"journal":{"name":"ACM Transactions on Computer Systems (TOCS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"93","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Computer Systems (TOCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3380905","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 93

Abstract

Log-Structured Merge Key-Value stores (LSM KVs) are designed to offer good write performance, by capturing client writes in memory, and only later flushing them to storage. Writes are later compacted into a tree-like data structure on disk to improve read performance and to reduce storage space use. It has been widely documented that compactions severely hamper throughput. Various optimizations have successfully dealt with this problem. These techniques include, among others, rate-limiting flushes and compactions, selecting among compactions for maximum effect, and limiting compactions to the highest level by so-called fragmented LSMs. In this article, we focus on latencies rather than throughput. We first document the fact that LSM KVs exhibit high tail latencies. The techniques that have been proposed for optimizing throughput do not address this issue, and, in fact, in some cases, exacerbate it. The root cause of these high tail latencies is interference between client writes, flushes, and compactions. Another major cause for tail latency is the heterogeneous nature of the workloads in terms of operation mix and item sizes whereby a few more computationally heavy requests slow down the vast majority of smaller requests. We introduce the notion of an Input/Output (I/O) bandwidth scheduler for an LSM-based KV store to reduce tail latency caused by interference of flushing and compactions and by workload heterogeneity. We explore three techniques as part of this I/O scheduler: (1) opportunistically allocating more bandwidth to internal operations during periods of low load, (2) prioritizing flushes and compactions at the lower levels of the tree, and (3) separating client requests by size and by data access path. SILK+ is a new open-source LSM KV that incorporates this notion of an I/O scheduler.

查看原文本刊更多论文

SILK+防止异构工作负载下日志结构合并键值存储的延迟峰值

日志结构合并键值存储(LSM kv)旨在通过捕获客户端在内存中的写操作，并在稍后将其刷新到存储中，从而提供良好的写性能。写操作随后被压缩成磁盘上的树状数据结构，以提高读取性能并减少存储空间的使用。压缩严重地阻碍了吞吐量，这已经被广泛地记录下来。各种优化已经成功地解决了这个问题。这些技术包括限制速度的刷新和压缩，在压缩中进行选择以获得最大效果，以及通过所谓的碎片化lsm将压缩限制到最高级别。在本文中，我们关注的是延迟而不是吞吐量。我们首先记录了LSM kv表现出高尾部延迟的事实。已经提出的用于优化吞吐量的技术并没有解决这个问题，事实上，在某些情况下，还会加剧这个问题。这些高尾延迟的根本原因是客户机写、刷新和压缩之间的干扰。造成尾部延迟的另一个主要原因是工作负载在操作组合和项目大小方面的异构性，其中一些计算量较大的请求会减慢绝大多数较小请求的速度。我们为基于lsm的KV存储引入了输入/输出(I/O)带宽调度程序的概念，以减少由刷新和压缩干扰以及工作负载异构引起的尾部延迟。作为这个I/O调度器的一部分，我们研究了三种技术:(1)在低负载期间为内部操作分配更多带宽，(2)在树的较低级别对刷新和压缩进行优先级排序，以及(3)根据大小和数据访问路径分离客户端请求。SILK+是一个新的开源LSM KV，它包含了I/O调度器的概念。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Computer Systems (TOCS)

自引率

0.00%

发文量