2019 IEEE International Conference on Networking, Architecture and Storage (NAS)最新文献

Load-aware Elastic Data Reduction and Re-computation for Adaptive Mesh Refinement 基于负载感知的弹性数据约简与自适应网格细化的重计算

2019 IEEE International Conference on Networking, Architecture and Storage (NAS) Pub Date : 2019-08-01 DOI: 10.1109/NAS.2019.8834727

Mengxiao Wang, Huizhang Luo, Qing Liu, Hong Jiang

{"title":"Load-aware Elastic Data Reduction and Re-computation for Adaptive Mesh Refinement","authors":"Mengxiao Wang, Huizhang Luo, Qing Liu, Hong Jiang","doi":"10.1109/NAS.2019.8834727","DOIUrl":"https://doi.org/10.1109/NAS.2019.8834727","url":null,"abstract":"The increasing performance gap between computation and I/O creates huge data management challenges for simulation-based scientific discovery. Data reduction, among others, is deemed to be a promising technique to bridge the gap through reducing the amount of data migrated to persistent storage. However, the reduction performance is still far from what is being demanded from production applications. To this end, we propose a new methodology that aggressively reduces data despite the substantial loss of information, and re-computes the original accuracy on-demand. As a result, our scheme creates an illusion of a fast and large storage medium with the availability of high-accuracy data. We further design a load-aware data reduction strategy that monitors the I/O overhead at runtime, and dynamically adjusts the reduction ratio. We verify the efficacy of our methodology through adaptive mesh refinement, a popular numerical technique for solving partial differential equations. We evaluate data reduction and selective data re-computation on Titan, using a real application in FLASH and mini-applications in Chombo. To clearly demonstrate the benefits of re-computation, we compare it with other state-of-the-art data reduction methods including SZ, ZFP, FPC and deduplication, and it is shown to be superior in both write and read speeds, particularly when a small amount of data (e.g., 1%) need to be retrieved, as well as reduction ratio. Our results confirm that data reduction and selective data re-computation can 1) reduce the performance gap between I/O and compute via aggressively reducing AMR levels, and more importantly 2) can recover the target accuracy efficiently for AMR through re-computation.","PeriodicalId":230796,"journal":{"name":"2019 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124694911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Optimizing Tail Latency of LDPC based Flash Memory Storage Systems Via Smart Refresh 基于LDPC的闪存存储系统尾部延迟智能刷新优化

2019 IEEE International Conference on Networking, Architecture and Storage (NAS) Pub Date : 2019-08-01 DOI: 10.1109/NAS.2019.8834728

Yina Lv, Liang Shi, Qiao Li, Congming Gao, C. Xue, E. Sha

{"title":"Optimizing Tail Latency of LDPC based Flash Memory Storage Systems Via Smart Refresh","authors":"Yina Lv, Liang Shi, Qiao Li, Congming Gao, C. Xue, E. Sha","doi":"10.1109/NAS.2019.8834728","DOIUrl":"https://doi.org/10.1109/NAS.2019.8834728","url":null,"abstract":"Flash memory has been developed with bit density improvement, technology scaling, and 3D stacking. With this trend, its reliability has been degraded significantly. Error correction code, low density parity code (LDPC), which has strong error correction capability, has been employed to solve this issue. However, one of the critical issues of LDPC is that it would introduce a long decoding latency on devices with low reliability. In this case, tail latency would happen, which will significantly impact the quality of service (QoS). In this work, a set of smart refresh schemes is proposed to optimize the tail latency. The basic idea of the work is to refresh data when the accessed data has a long decoding latency. Two smart refresh schemes are proposed for this work: The first refresh scheme is designed to refresh long access latency data when it is accessed several times for access performance optimization; The second refresh scheme is designed to periodical detecting data with extremely long access latency and refreshing them for tail latency optimization. Experiment results show that the proposed schemes are able to significantly improve the tail latency and access performance with little overhead.","PeriodicalId":230796,"journal":{"name":"2019 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"339 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113982818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

NAS 2019 Messages

2019 IEEE International Conference on Networking, Architecture and Storage (NAS) Pub Date : 2019-08-01 DOI: 10.1109/nas.2019.8834712

引用次数: 0

Thermo-GC: Reducing Write Amplification by Tagging Migrated Pages during Garbage Collection 热gc:通过在垃圾收集期间标记迁移页面来减少写放大

2019 IEEE International Conference on Networking, Architecture and Storage (NAS) Pub Date : 2019-08-01 DOI: 10.1109/NAS.2019.8834722

Jing Yang, Shuyi Pei

{"title":"Thermo-GC: Reducing Write Amplification by Tagging Migrated Pages during Garbage Collection","authors":"Jing Yang, Shuyi Pei","doi":"10.1109/NAS.2019.8834722","DOIUrl":"https://doi.org/10.1109/NAS.2019.8834722","url":null,"abstract":"Flash memory based solid-state drive (SSD) has been deployed in various systems because of its significant advantages over hard disk drive in terms of throughput and IOPS. One inherent operation that is necessary in SSD is garbage collection (GC), a procedure that selects an erasure candidate block and moves valid data on the selected candidate to another block. The performance of SSD is greatly influenced by GC. While existing studies have made advances in minimizing GC cost, few took advantages of the procedure of GC itself. As GC goes on, valid pages in an erasure candidate block tend to have similar lifetimes that can be exploited to minimize page’s movements. In this paper, we introduce Thermo-GC. The idea is to identify data’s hotness during GC operations and group data that have similar lifetimes to the same block. By clustering valid pages based on their hotness, Thermo-GC can minimize valid page movements and reduce GC cost. Experiment results show that Thermo-GC reduces data movements during GC by 78% and write amplification factor by 29.7% on average, implying extended lifetimes of SSDs.","PeriodicalId":230796,"journal":{"name":"2019 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130021923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

HCMA: Supporting High Concurrency of Memory Accesses with Scratchpad Memory in FPGAs HCMA: fpga中使用刮板存储器支持高并发内存访问

2019 IEEE International Conference on Networking, Architecture and Storage (NAS) Pub Date : 2019-08-01 DOI: 10.1109/NAS.2019.8834726

Yangyang Zhao, Yuhang Liu, Wei Li, Mingyu Chen

{"title":"HCMA: Supporting High Concurrency of Memory Accesses with Scratchpad Memory in FPGAs","authors":"Yangyang Zhao, Yuhang Liu, Wei Li, Mingyu Chen","doi":"10.1109/NAS.2019.8834726","DOIUrl":"https://doi.org/10.1109/NAS.2019.8834726","url":null,"abstract":"Currently many researches focus on new methods of accelerating memory accesses between memory controller and memory modules. However, the absence of an accelerator for memory accesses between CPU and memory controller wastes the performance benefits of new methods. Therefore, we propose a coordinated batch method to support high concurrency of memory accesses (HCMA). Compared to the conventional method of holding outstanding memory access requests in miss status handling registers (MSHRs), HCMA method takes advantage of scratchpad memory in FPGAs or SoCs to circumvent the limitation of MSHR entries. The concurrency of requests is only limited by the capacity of scratchpad memory. Moreover, to avoid the higher latency when searching more entries, we design an efficient coordinating mechanism based on circular queues.We evaluate the performance of HCMA method on an MP-SoC FPGA platform. Compared to conventional methods based on MSHRs, HCMA method supports ten times of concurrent memory accesses (from 10 to 128 entries on our evaluation platform). HCMA method achieves up to 2.72× memory bandwidth utilization for applications that access memory with massive fine-grained random requests, and to 3.46× memory bandwidth utilization for stream-based memory accesses. For real applications like CG, our method improves speedup performance by 29.87%.","PeriodicalId":230796,"journal":{"name":"2019 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116700796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Ares: A Scalable High-Performance Passive Measurement Tool Using a Multicore System Ares:使用多核系统的可扩展高性能被动测量工具

2019 IEEE International Conference on Networking, Architecture and Storage (NAS) Pub Date : 2019-08-01 DOI: 10.1109/NAS.2019.8834734

Xiaoban Wu, Yan Luo, Jeronimo Bezerra, Liang-Min Wang

引用次数: 3

Learning Workflow Scheduling on Multi-Resource Clusters 学习多资源集群的工作流调度

2019 IEEE International Conference on Networking, Architecture and Storage (NAS) Pub Date : 2019-08-01 DOI: 10.1109/NAS.2019.8834720

Yang Hu, C. D. Laat, Zhiming Zhao

引用次数: 9

Contention Aware Workload and Resource Co-Scheduling on Power-Bounded Systems 电力有限系统的竞争感知工作负载和资源协同调度

2019 IEEE International Conference on Networking, Architecture and Storage (NAS) Pub Date : 2019-08-01 DOI: 10.1109/NAS.2019.8834721

Pengfei Zou, Xizhou Feng, Rong Ge

{"title":"Contention Aware Workload and Resource Co-Scheduling on Power-Bounded Systems","authors":"Pengfei Zou, Xizhou Feng, Rong Ge","doi":"10.1109/NAS.2019.8834721","DOIUrl":"https://doi.org/10.1109/NAS.2019.8834721","url":null,"abstract":"As power becomes a top challenge in HPC systems and data centers, how to sustain the system performance growth under limited available or permissible power becomes an important research topic. Traditionally, researchers have explored collocating non-interfering jobs on the same nodes to improve system performance. Nevertheless, power limits reduce the capacity of components, nodes, and systems, and induce or aggravate contention between jobs. Using prior power-oblivious job collocation strategies on power limited systems can adversely degrade system throughput. In this paper, we quantitatively estimate contention induced by power limits, and propose a Contention-Aware Power-bounded Scheduling (CAPS) for systems with finite power budgets. CAPS chooses to collocate jobs that are complementary when power is limited, and distributes the available power to nodes and components to minimize their interference. Experimental results show that CAPS improves system throughput and power efficiency by 10% or greater than power-oblivious job collocation strategies, depending on the available power, for hybrid MPI/OpenMP benchmarks on a 192-core 8-node cluster.","PeriodicalId":230796,"journal":{"name":"2019 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124082411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Leveraging Array Mapped Tries in KSM for Lightweight Memory Deduplication 利用KSM中的数组映射尝试实现轻量级内存重复数据删除

2019 IEEE International Conference on Networking, Architecture and Storage (NAS) Pub Date : 2019-08-01 DOI: 10.1109/NAS.2019.8834730

Lingjing You, Yongkun Li, Fan Guo, Yinlong Xu, Jinzhong Chen, Liu Yuan

{"title":"Leveraging Array Mapped Tries in KSM for Lightweight Memory Deduplication","authors":"Lingjing You, Yongkun Li, Fan Guo, Yinlong Xu, Jinzhong Chen, Liu Yuan","doi":"10.1109/NAS.2019.8834730","DOIUrl":"https://doi.org/10.1109/NAS.2019.8834730","url":null,"abstract":"In cloud computing, how to use limited hardware resources to meet the increasing demands has become a major issue. KSM (Kernel Same-page Merging) is a content-based page sharing mechanism used in Linux that merges equal memory pages, thereby significantly reducing memory usage and increasing the density of virtual machines or containers. However, KSM introduces a large overhead in CPU and memory bandwidth usage due to the use of red-black trees and content-based page comparison. To reduce the deduplication overhead, in this paper, we propose a new design called AMT-KSM, which leverages array mapped tries to realize lightweight memory deduplication. The basic idea is to divide each memory page into multiple segments and use the concatenated strings of the hash values of segments as indexed keys in the tries. By doing this, we can significantly reduce the time required for searching duplicate pages as well as the number of page comparisons. We conduct experiments to evaluate the performance of our design, and results show that compared with the conventional KSM, AMT-KSM can reduce up to 44.9% CPU usage and 31.6% memory bandwidth usage.","PeriodicalId":230796,"journal":{"name":"2019 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128101178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

NAS 2019 Keynotes

2019 IEEE International Conference on Networking, Architecture and Storage (NAS) Pub Date : 2019-08-01 DOI: 10.1109/nas.2019.8834717

引用次数: 0