Nima Elyasi, Changho Choi, A. Sivasubramaniam, Jingpei Yang, V. Balakrishnan
{"title":"修整尾部以提高ssd的确定性读性能","authors":"Nima Elyasi, Changho Choi, A. Sivasubramaniam, Jingpei Yang, V. Balakrishnan","doi":"10.1109/IISWC47752.2019.9042073","DOIUrl":null,"url":null,"abstract":"With SSDs becoming commonplace in several customer-facing datacenter applications, there is a critical need for optimizing for tail latencies (particularly reads). In this paper, we conduct a systematic analysis, removing one bottleneck after another, to study the root causes behind long tail latencies on a state-of-the-art high-end SSD. Contrary to a lot of prior observations, we find that Garbage Collection (GC) is not a key contributor, and it is more the variances in queue lengths across the flash chips that is the culprit. Particularly, reads waiting for long latency writes, which has been the target for much study, is at the root of this problem. While write pausing/preemption has been proposed as a remedy, in this paper we explore a more simple and alternate solution that leverages existing RAID groups into which flash chips are organized. While a long latency operation is ongoing, rather than waiting, the read could get its data by reconstructing it from the remaining chips of that group (including parity). However, this introduces additional reads, and we propose an adaptive scheduler called ATLAS that dynamically figures out whether to wait or to reconstruct the data from other chips. The resulting ATLAS optimization cuts the 99.99th percentile read latency by as much as 10X, with a reduction of 4X on the average across a wide spectrum of workloads.","PeriodicalId":121068,"journal":{"name":"2019 IEEE International Symposium on Workload Characterization (IISWC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Trimming the Tail for Deterministic Read Performance in SSDs\",\"authors\":\"Nima Elyasi, Changho Choi, A. Sivasubramaniam, Jingpei Yang, V. Balakrishnan\",\"doi\":\"10.1109/IISWC47752.2019.9042073\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With SSDs becoming commonplace in several customer-facing datacenter applications, there is a critical need for optimizing for tail latencies (particularly reads). In this paper, we conduct a systematic analysis, removing one bottleneck after another, to study the root causes behind long tail latencies on a state-of-the-art high-end SSD. Contrary to a lot of prior observations, we find that Garbage Collection (GC) is not a key contributor, and it is more the variances in queue lengths across the flash chips that is the culprit. Particularly, reads waiting for long latency writes, which has been the target for much study, is at the root of this problem. While write pausing/preemption has been proposed as a remedy, in this paper we explore a more simple and alternate solution that leverages existing RAID groups into which flash chips are organized. While a long latency operation is ongoing, rather than waiting, the read could get its data by reconstructing it from the remaining chips of that group (including parity). However, this introduces additional reads, and we propose an adaptive scheduler called ATLAS that dynamically figures out whether to wait or to reconstruct the data from other chips. The resulting ATLAS optimization cuts the 99.99th percentile read latency by as much as 10X, with a reduction of 4X on the average across a wide spectrum of workloads.\",\"PeriodicalId\":121068,\"journal\":{\"name\":\"2019 IEEE International Symposium on Workload Characterization (IISWC)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Symposium on Workload Characterization (IISWC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IISWC47752.2019.9042073\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on Workload Characterization (IISWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC47752.2019.9042073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Trimming the Tail for Deterministic Read Performance in SSDs
With SSDs becoming commonplace in several customer-facing datacenter applications, there is a critical need for optimizing for tail latencies (particularly reads). In this paper, we conduct a systematic analysis, removing one bottleneck after another, to study the root causes behind long tail latencies on a state-of-the-art high-end SSD. Contrary to a lot of prior observations, we find that Garbage Collection (GC) is not a key contributor, and it is more the variances in queue lengths across the flash chips that is the culprit. Particularly, reads waiting for long latency writes, which has been the target for much study, is at the root of this problem. While write pausing/preemption has been proposed as a remedy, in this paper we explore a more simple and alternate solution that leverages existing RAID groups into which flash chips are organized. While a long latency operation is ongoing, rather than waiting, the read could get its data by reconstructing it from the remaining chips of that group (including parity). However, this introduces additional reads, and we propose an adaptive scheduler called ATLAS that dynamically figures out whether to wait or to reconstruct the data from other chips. The resulting ATLAS optimization cuts the 99.99th percentile read latency by as much as 10X, with a reduction of 4X on the average across a wide spectrum of workloads.