Bingzhe Li, F. Toussi, C. Anderson, D. Lilja, D. Du
{"title":"TraceRAR: An I/O Performance Evaluation Tool for Replaying, Analyzing, and Regenerating Traces","authors":"Bingzhe Li, F. Toussi, C. Anderson, D. Lilja, D. Du","doi":"10.1109/NAS.2017.8026880","DOIUrl":"https://doi.org/10.1109/NAS.2017.8026880","url":null,"abstract":"Adopting a new technology, such as a new storage system, is a complicated process because the supporting ecosystems also have to be changed. As a result, any new technology requires exhaustive performance evaluation to justify the cost of switching. However, synthetic workloads or benchmarks typically cannot completely characterize the actual workload. On the other hand, the time and effort required to obtain an appropriate trace can be prohibitive. This work presents a block-level performance measurement tool for storage systems combined with a trace re- player, a trace characteristics analyzer, and a trace re-generator. This new tool is compatible with several different platforms, including Linux and AIX. The purpose of the tool is to evaluate system performance when executing a given application, and to help users determine which system best fits their specific application. Additionally, the trace analyzer can provide details about the characteristics of a given trace. Using the trace analysis results, the re- generator can produce arbitrarily long I/O traces to improve the accuracy of the performance evaluation. The tool also can be used to determine whether a particular system can be adapted to a specific application, and to make comparisons between systems.","PeriodicalId":222161,"journal":{"name":"2017 International Conference on Networking, Architecture, and Storage (NAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129712736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linjun Mei, D. Feng, Jianxi Chen, Lingfang Zeng, Jingning Liu
{"title":"A Write-Through Cache Method to Improve Small Write Performance of SSD-Based RAID","authors":"Linjun Mei, D. Feng, Jianxi Chen, Lingfang Zeng, Jingning Liu","doi":"10.1109/NAS.2017.8026840","DOIUrl":"https://doi.org/10.1109/NAS.2017.8026840","url":null,"abstract":"With the development of technology and price decline, flash-based Solid state drives (SSDs) are rapidly used to construct RAIDs by storage vendors. SSD does not need to seek and rotate, therefore, its read performance is much better than that of HDD. However, the small write performance of SSD is limited by its inherent characteristics such as out- of-place updates and garbage collection. The traditional parity-based RAID also has small write problem because of parity updating. SSD-based RAID, which is called RAIS, is generally based on the traditional RAID design and implementation. Consequently, handling small write requests is a serious challenge when SSD is used to construct parity-based RAID. In RAIS storage system, small write requests not only result in poor performance, but also shorten the lifetime of each SSD. In this paper, we propose a novel write through cache method, called CRAIS5, which uses a RAM as the write cache of RAIS5, and adopts the write-through mode to delay the parity update. The write-through cache method makes full use of the flash characteristics, and removes the pre-read operation. CRAIS5 improves the small write performance and reduces the erase time. We have implemented the CRAIS5 prototype in Disksim simulator, and used the real traces to evaluate the performance. The evaluations demonstrate that our CRAIS5 outperforms RAIS5, and PPC, on average, by 42.82%, and 34.49% respectively.","PeriodicalId":222161,"journal":{"name":"2017 International Conference on Networking, Architecture, and Storage (NAS)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123642616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yucheng Zhang, D. Feng, Yu Hua, Yuchong Hu, Wen Xia, Min Fu, Xiaolan Tang, Zhikun Wang, Fangting Huang, Yukun Zhou
{"title":"Reducing Chunk Fragmentation for In-Line Delta Compressed and Deduplicated Backup Systems","authors":"Yucheng Zhang, D. Feng, Yu Hua, Yuchong Hu, Wen Xia, Min Fu, Xiaolan Tang, Zhikun Wang, Fangting Huang, Yukun Zhou","doi":"10.1109/NAS.2017.8026874","DOIUrl":"https://doi.org/10.1109/NAS.2017.8026874","url":null,"abstract":"Chunk-level deduplication, while robust in removing duplicate chunks, introduces chunk fragmentation which decreases restore performance. Rewriting algorithms are proposed to reduce the chunk fragmentation and accelerate the restore speed. Delta compression can remove redundant data between non-duplicate but similar chunks which cannot be eliminated by chunk-level deduplication. Some applications use delta compression as a complement for chunk-level deduplication to attain extra space and bandwidth savings. However, we observe that delta compression introduces a new type of chunk fragmentation stemming from delta compressed chunks whose base chunks are fragmented. We refer to such delta compressed chunks as base-fragmented chunks. We found that this new type of chunk fragmentation has a more severely impact on the restore performance than the chunk fragmentation introduced by chunk-level deduplication and cannot be reduced by existing rewriting algorithms. In order to address the problem due to the base-fragmented chunks, we propose SDC, a scheme that selectively performs delta compression after chunk-level deduplication. The main idea behind SDC is to simulate a restore cache to identify the non-base-fragmented chunks and only perform delta compression for these chunks, thus avoiding the new type of chunk fragmentation. Due to the locality among the backup streams, most of the non-base-fragmented chunks can be detected by the simulated restore cache. Experimental results based on real-world datasets show that SDC improves the restore performance of the delta compressed and deduplicated backup system by 1.93X-7.48X, and achieves 95.5%-97.4% of its compression, while imposing negligible impact on the backup throughput.","PeriodicalId":222161,"journal":{"name":"2017 International Conference on Networking, Architecture, and Storage (NAS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126983970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance Evaluation and Modeling of HPC I/O on Non-Volatile Memory","authors":"W. Liu, Kai Wu, Jialin Liu, F. Chen, Dong Li","doi":"10.1109/NAS.2017.8026869","DOIUrl":"https://doi.org/10.1109/NAS.2017.8026869","url":null,"abstract":"HPC applications pose high demands on I/O performance and storage capability. The emerging non-volatile memory (NVM) techniques offer low- latency, high bandwidth, and persistence for HPC applications. However, the existing I/O stack are designed and optimized based on an assumption of disk-based storage. To effectively use NVM, we must re-examine the existing high performance computing (HPC) I/O sub-system to properly integrate NVM into it. Using NVM as a fast storage, the previous assumption on the inferior performance of storage (e.g., hard drive) is not valid any more. The performance problem caused by slow storage may be mitigated; the existing mechanisms to narrow the performance gap between storage and CPU may be unnecessary and result in large overhead. Thus fully understanding the impact of introducing NVM into the HPC software stack demands a thorough performance study. In this paper, we analyze and model the performance of I/O intensive HPC applications with NVM as a block device. We study the performance from three perspectives: (1) the impact of NVM on the performance of traditional page cache; (2) a performance comparison between MPI individual I/O and POSIX I/O; and (3) the impact of NVM on the performance of collective I/O. We reveal the diminishing effects of page cache, minor performance difference between MPI individual I/O and POSIX I/O, and performance disadvantage of collective I/O on NVM due to unnecessary data shuffling. We also model the performance of MPI collective I/O and study the complex interaction between data shuffling, storage performance, and I/O access patterns.","PeriodicalId":222161,"journal":{"name":"2017 International Conference on Networking, Architecture, and Storage (NAS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115191444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}