修整尾部以提高ssd的确定性读性能

2019 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2019-11-01 DOI:10.1109/IISWC47752.2019.9042073

Nima Elyasi, Changho Choi, A. Sivasubramaniam, Jingpei Yang, V. Balakrishnan

{"title":"修整尾部以提高ssd的确定性读性能","authors":"Nima Elyasi, Changho Choi, A. Sivasubramaniam, Jingpei Yang, V. Balakrishnan","doi":"10.1109/IISWC47752.2019.9042073","DOIUrl":null,"url":null,"abstract":"With SSDs becoming commonplace in several customer-facing datacenter applications, there is a critical need for optimizing for tail latencies (particularly reads). In this paper, we conduct a systematic analysis, removing one bottleneck after another, to study the root causes behind long tail latencies on a state-of-the-art high-end SSD. Contrary to a lot of prior observations, we find that Garbage Collection (GC) is not a key contributor, and it is more the variances in queue lengths across the flash chips that is the culprit. Particularly, reads waiting for long latency writes, which has been the target for much study, is at the root of this problem. While write pausing/preemption has been proposed as a remedy, in this paper we explore a more simple and alternate solution that leverages existing RAID groups into which flash chips are organized. While a long latency operation is ongoing, rather than waiting, the read could get its data by reconstructing it from the remaining chips of that group (including parity). However, this introduces additional reads, and we propose an adaptive scheduler called ATLAS that dynamically figures out whether to wait or to reconstruct the data from other chips. The resulting ATLAS optimization cuts the 99.99th percentile read latency by as much as 10X, with a reduction of 4X on the average across a wide spectrum of workloads.","PeriodicalId":121068,"journal":{"name":"2019 IEEE International Symposium on Workload Characterization (IISWC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Trimming the Tail for Deterministic Read Performance in SSDs\",\"authors\":\"Nima Elyasi, Changho Choi, A. Sivasubramaniam, Jingpei Yang, V. Balakrishnan\",\"doi\":\"10.1109/IISWC47752.2019.9042073\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With SSDs becoming commonplace in several customer-facing datacenter applications, there is a critical need for optimizing for tail latencies (particularly reads). In this paper, we conduct a systematic analysis, removing one bottleneck after another, to study the root causes behind long tail latencies on a state-of-the-art high-end SSD. Contrary to a lot of prior observations, we find that Garbage Collection (GC) is not a key contributor, and it is more the variances in queue lengths across the flash chips that is the culprit. Particularly, reads waiting for long latency writes, which has been the target for much study, is at the root of this problem. While write pausing/preemption has been proposed as a remedy, in this paper we explore a more simple and alternate solution that leverages existing RAID groups into which flash chips are organized. While a long latency operation is ongoing, rather than waiting, the read could get its data by reconstructing it from the remaining chips of that group (including parity). However, this introduces additional reads, and we propose an adaptive scheduler called ATLAS that dynamically figures out whether to wait or to reconstruct the data from other chips. The resulting ATLAS optimization cuts the 99.99th percentile read latency by as much as 10X, with a reduction of 4X on the average across a wide spectrum of workloads.\",\"PeriodicalId\":121068,\"journal\":{\"name\":\"2019 IEEE International Symposium on Workload Characterization (IISWC)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Symposium on Workload Characterization (IISWC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IISWC47752.2019.9042073\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on Workload Characterization (IISWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC47752.2019.9042073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

随着ssd在一些面向客户的数据中心应用程序中变得越来越普遍，迫切需要优化尾部延迟(特别是读取)。在本文中，我们进行了系统的分析，消除了一个又一个瓶颈，以研究在最先进的高端SSD上长尾延迟背后的根本原因。与之前的许多观察结果相反，我们发现垃圾收集(GC)并不是一个关键因素，更重要的是闪存芯片之间队列长度的差异才是罪魁祸首。特别是，读取等待长时间延迟的写入，这一直是许多研究的目标，是这个问题的根源。虽然写暂停/抢占已经被提议作为一种补救措施，但在本文中，我们探索了一种更简单的替代解决方案，即利用现有的RAID组来组织闪存芯片。当长延迟操作正在进行而不是等待时，读取可以通过从该组的剩余芯片(包括奇偶校验)重建数据来获得数据。然而，这引入了额外的读取，我们提出了一个称为ATLAS的自适应调度程序，它动态地确定是等待还是重建来自其他芯片的数据。由此产生的ATLAS优化将99.99%的读延迟减少了10倍，在广泛的工作负载范围内平均减少了4倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Trimming the Tail for Deterministic Read Performance in SSDs

With SSDs becoming commonplace in several customer-facing datacenter applications, there is a critical need for optimizing for tail latencies (particularly reads). In this paper, we conduct a systematic analysis, removing one bottleneck after another, to study the root causes behind long tail latencies on a state-of-the-art high-end SSD. Contrary to a lot of prior observations, we find that Garbage Collection (GC) is not a key contributor, and it is more the variances in queue lengths across the flash chips that is the culprit. Particularly, reads waiting for long latency writes, which has been the target for much study, is at the root of this problem. While write pausing/preemption has been proposed as a remedy, in this paper we explore a more simple and alternate solution that leverages existing RAID groups into which flash chips are organized. While a long latency operation is ongoing, rather than waiting, the read could get its data by reconstructing it from the remaining chips of that group (including parity). However, this introduces additional reads, and we propose an adaptive scheduler called ATLAS that dynamically figures out whether to wait or to reconstruct the data from other chips. The resulting ATLAS optimization cuts the 99.99th percentile read latency by as much as 10X, with a reduction of 4X on the average across a wide spectrum of workloads.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE International Symposium on Workload Characterization (IISWC)

自引率

0.00%

发文量