Duwon Hong, Myungsuk Kim, Jisung Park, Myoungsoo Jung, Jihong Kim
{"title":"Improving SSD Performance Using Adaptive Restricted-Copyback Operations","authors":"Duwon Hong, Myungsuk Kim, Jisung Park, Myoungsoo Jung, Jihong Kim","doi":"10.1109/NVMSA.2019.8863524","DOIUrl":"https://doi.org/10.1109/NVMSA.2019.8863524","url":null,"abstract":"Copyback operation can improve the performance of data migrations in SSD, but they are rarely used because of their error propagation problem. In this paper, we propose an integrated approach that maximizes the efficiency of copyback operations but does not compromise data reliability. First, we propose a novel per-block error propagation model under consecutive copyback operations. Our model significantly increases the number of successive copybacks by exploiting the aging characteristics of NAND blocks. Second, we devise a resource-efficient error management scheme that can handle successive copybacks where pages move around multiple blocks with different reliability. Experimental results show that the proposed technique can improve the IO throughput by up to 25% over the existing technique.","PeriodicalId":438544,"journal":{"name":"2019 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134231615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NVMSA 2019 Message from the General Co-Chairs","authors":"","doi":"10.1109/nvmsa.2019.8863512","DOIUrl":"https://doi.org/10.1109/nvmsa.2019.8863512","url":null,"abstract":"","PeriodicalId":438544,"journal":{"name":"2019 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114260173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Host-Level Workload-Aware Budget Compensation I/O Scheduling for Open-Channel SSDs","authors":"Sooyun Lee, Kyuhwa Han, Dongkun Shin","doi":"10.1109/NVMSA.2019.8863515","DOIUrl":"https://doi.org/10.1109/NVMSA.2019.8863515","url":null,"abstract":"In datacenters and cloud computing, Quality of Service (QoS) is an essential concept as access to shared resources, including solid state drives (SSDs), must be ensured. The previously proposed workload-aware budget compensation (WA-BC) scheduling algorithm is a device I/O scheduler for guaranteeing performance isolation among multiple virtual machines sharing an SSD. This paper aims to resolve the following three shortcomings of WA-BC: (1) it is applicable to only SR-IOV supporting SSDs, (2) it is unfit for various types of workloads, and (3) it manages flash memory blocks separately in an inappropriate manner. We propose the host-level WA-BC (hWA-BC) scheduler, which aims to achieve performance isolation between multiple processes sharing an open-channel SSD.","PeriodicalId":438544,"journal":{"name":"2019 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124664156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance Evaluation on NVMM Emulator Employing Fine-Grain Delay Injection","authors":"Yusuke Omori, K. Kimura","doi":"10.1109/NVMSA.2019.8863522","DOIUrl":"https://doi.org/10.1109/NVMSA.2019.8863522","url":null,"abstract":"The emerging technology of byte-addressable nonvolatile memory chips is expected to enable larger main memory and lower power consumption than the traditional DRAM. It also realizes durable data structure without ordinary file systems. However, while enumerating the advantages of nonvolatile main memory (NVMM), its write-time expensive latency and higher energy consumption in comparision with a DRAM must be considered. These special characteristics of NVMM require new compiler techniques and OS support as well as new memory architectures. Several NVMM emulators built on real machines have been proposed to facilitate those software and hardware researches. Their designs were originally based on a simple coarse-grain delay model that injected additional clock cycles in the read and write requests sent to the memory controller. However, they could not utilize bank-level parallelism and row-buffer access locality, relied on by today’s memory modules, to exploit their performance. Therefore, a fine-grain delay model was recently proposed where the delay is injected for the primitive memory operations issued by the memory controller. In this paper, we implement both the coarse-grain and the fine-grain delay models on an SoC-FPGA board along with the use of Linux kernel modifications and several runtime functions. Then, the program behavior differences between two models are evaluated with SPEC CPU programs. The fine-grain model reveals the program execution time is influenced by the frequency of NVMM memory requests rather than the cache hit ratio. Bank-level parallelism and row-buffer access locality also affect the memory access delay, and the fine-grain model shows lower execution time for four of fourteen programs than the coarse-grain even when the former has longer total write latency.","PeriodicalId":438544,"journal":{"name":"2019 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131266572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Han, Shangzhen Tan, Bin Xiao, Chenlin Ma, Z. Shao
{"title":"Optimizing Cauchy Reed-Solomon Coding via ReRAM Crossbars in SSD-based RAID Systems","authors":"Lei Han, Shangzhen Tan, Bin Xiao, Chenlin Ma, Z. Shao","doi":"10.1109/NVMSA.2019.8863519","DOIUrl":"https://doi.org/10.1109/NVMSA.2019.8863519","url":null,"abstract":"Erasure codes such as Cauchy Reed-Solomon codes have been gaining ever-increasing importance for fault-tolerance in the SSD-based RAID arrays. However, erasure coding on a processor-based RAID controller relies on Galois Filed arithmetic to perform matrix-vector multiplication, which increases the computation complexity and leads to a huge number of memory accesses. In this paper, we investigate utilizing ReRAM to improve erasure coding performance. We propose Re-RAID which uses ReRAM as main memory in both RAID and SSD controllers, in which erasure coding can be processed on ReRAM. We also propose a confluent Cauchy-Vandermonde matrix as the generator matrix for encoding. By doing this, Re-RAID can distribute the reconstruction tasks for a single failure to SSDs, and then SSDs can recover the data with ReRAM memory. Experimental results show that we can improve the encoding and decoding performance by up to $598 times $ and $251 times $, respectively.","PeriodicalId":438544,"journal":{"name":"2019 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133707990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cheng Ji, Lun Wang, Qiao Li, Congming Gao, Liang Shi, Chia-Lin Yang, C. Xue
{"title":"Fair Down to the Device: A GC-Aware Fair Scheduler for SSD","authors":"Cheng Ji, Lun Wang, Qiao Li, Congming Gao, Liang Shi, Chia-Lin Yang, C. Xue","doi":"10.1109/NVMSA.2019.8863523","DOIUrl":"https://doi.org/10.1109/NVMSA.2019.8863523","url":null,"abstract":"Solid-state drives (SSD) are the mainstream solutions for massive data storage today. For modern computer systems, fair resource assignment is a critical design consideration and has drawn great interests in recent years. Although there are several I/O fairness schedulers proposed on the host side for SSDs, process fairness could still be dramatically degraded if garbage collection (GC) is triggered in the device side. A GC operation could block I/O requests, which causes unpredictable read/write latency variation and further impacts fairness between processes. This paper proposes Fair-GC, a novel coordinated host and device I/O scheduling strategy to achieve true fairness considering GC interferences. The key idea is to orchestrate GC operations inside SSDs carefully such that performance of a process is penalized by GC in the same degree (or comparable) as when it runs alone. In this way, the I/O fairness maintained by the host-side scheduler can be maintained in the presence of GC. Furthermore, our scheduler ensures that the timeslice of a process maintained at the host-side scheduler is updated in a timely manner to avoid unnecessary slowdown for maintaining fairness. Experimental results with a wide range of workloads verify that the proposed technique can achieve fairness as well as improve the throughput significantly. Compared to conventional fairness-based I/O scheduler, Fair-GC can reduce the slowdown of real applications by up to 99%, and improve the throughput by as much as 225%, respectively.","PeriodicalId":438544,"journal":{"name":"2019 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA)","volume":"34 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114124724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aijiao Cui, Zhenxing Chang, Ziming Wang, G. Qu, Huawei Li
{"title":"A Memristor-based Scan Hold Flip-Flop","authors":"Aijiao Cui, Zhenxing Chang, Ziming Wang, G. Qu, Huawei Li","doi":"10.1109/NVMSA.2019.8863517","DOIUrl":"https://doi.org/10.1109/NVMSA.2019.8863517","url":null,"abstract":"The scan based design-for-testability (DfT) has been widely adopted in modern integrated circuits (ICs) design to facilitate manufacture testing. However, the transitions in scan cells result in much test power consumption during testing. The scan hold flip-flop (SHFF) can insulate the transitions in scan chain from the circuit under test to reduce test power while incurring much area overhead. We propose to solve this problem by adopting a memristor-based D flip-flop (DFF) into SHFF. The new design breaks down the design structure of conventional CMOS scan cells and adopts memristors into SHFF to reduce the number of transistors and hence the chip area. The functionality of the proposed design is verified to be correct by HSPICE simulation. Compared with the conventional SHFF cells, the area overhead is reduced 26.5%","PeriodicalId":438544,"journal":{"name":"2019 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134271155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"fsync-aware Multi-Buffer FTL for Improving the fsync Latency with Open-Channel SSDs","authors":"Somm Kim, Yunji Kang, Dongkun Shin","doi":"10.1109/NVMSA.2019.8863514","DOIUrl":"https://doi.org/10.1109/NVMSA.2019.8863514","url":null,"abstract":"Open-Channel SSDs are widely studied because of their advantages such as predictable latency, efficient data placement, and I/O scheduling. Currently, the Linux kernel includes pblk (The Physical Block Device), a host FTL that supports Open-Channel SSDs. In addition, there are recent studies that expand the single-threaded architecture of pblk to multi-threaded architecture: MT-FTL and QBLK. However, both pblk and recent studies were designed without considering fsync latency. However, since the fsync system call is performed synchronously, has a great effect on the performance of the system. In this paper, we propose FA-FTL, which is a host FTL considering fsync latency. Experiments show that FA-FTL is 141% higher than pblk and 119% higher than MT-FTL.","PeriodicalId":438544,"journal":{"name":"2019 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125532893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}