Transparent Dual Memory Compression Architecture

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) Pub Date : 2017-09-01 DOI:10.1109/PACT.2017.12

Seikwon Kim, Seonyoung Lee, Taehoon Kim, Jaehyuk Huh

{"title":"Transparent Dual Memory Compression Architecture","authors":"Seikwon Kim, Seonyoung Lee, Taehoon Kim, Jaehyuk Huh","doi":"10.1109/PACT.2017.12","DOIUrl":null,"url":null,"abstract":"The increasing memory requirements of big data applications have been driving the precipitous growth of memory capacity in server systems. To maximize the efficiency of external memory, HW-based memory compression techniques have been proposed to increase effective memory capacity. Although such memory compression techniques can improve the memory efficiency significantly, a critical trade-off exists in the HW-based compression techniques. As the memory blocks need to be decompressed as quickly as possible to serve cache misses, latency-optimized techniques apply compression at the cacheline granularity, achieving the decompression latency of less than a few cycles. However, such latency-optimized techniques can lose the potential high compression ratios of capacity-optimized techniques, which compress larger memory blocks with longer latency algorithms.Considering the fundamental trade-off in the memory compression, this paper proposes a transparent dual memory compression (DMC) architecture, which selectively uses two compression algorithms with distinct latency and compression characteristics. Exploiting the locality of memory accesses, the proposed architecture compresses less frequently accessed blocks with a capacity-optimized compression algorithm, while keeping recently accessed blocks compressed with a latency-optimized one. Furthermore, instead of relying on the support from the virtual memory system to locate compressed memory blocks, the study advocates a HW-based translation between the uncompressed address space and compressed physical space. This OS-transparent approach eliminates conflicts between compression efficiency and large page support adopted to reduce TLB misses. The proposed compression architecture is applied to the Hybrid Memory Cube (HMC) with a logic layer under the stacked DRAMs. The experimental results show that the proposed compression architecture provides 54% higher compression ratio than the state-of-the-art latency-optimized technique, with no performance degradation over the baseline system without compression.","PeriodicalId":438103,"journal":{"name":"2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2017.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

The increasing memory requirements of big data applications have been driving the precipitous growth of memory capacity in server systems. To maximize the efficiency of external memory, HW-based memory compression techniques have been proposed to increase effective memory capacity. Although such memory compression techniques can improve the memory efficiency significantly, a critical trade-off exists in the HW-based compression techniques. As the memory blocks need to be decompressed as quickly as possible to serve cache misses, latency-optimized techniques apply compression at the cacheline granularity, achieving the decompression latency of less than a few cycles. However, such latency-optimized techniques can lose the potential high compression ratios of capacity-optimized techniques, which compress larger memory blocks with longer latency algorithms.Considering the fundamental trade-off in the memory compression, this paper proposes a transparent dual memory compression (DMC) architecture, which selectively uses two compression algorithms with distinct latency and compression characteristics. Exploiting the locality of memory accesses, the proposed architecture compresses less frequently accessed blocks with a capacity-optimized compression algorithm, while keeping recently accessed blocks compressed with a latency-optimized one. Furthermore, instead of relying on the support from the virtual memory system to locate compressed memory blocks, the study advocates a HW-based translation between the uncompressed address space and compressed physical space. This OS-transparent approach eliminates conflicts between compression efficiency and large page support adopted to reduce TLB misses. The proposed compression architecture is applied to the Hybrid Memory Cube (HMC) with a logic layer under the stacked DRAMs. The experimental results show that the proposed compression architecture provides 54% higher compression ratio than the state-of-the-art latency-optimized technique, with no performance degradation over the baseline system without compression.

查看原文本刊更多论文

透明双内存压缩架构

大数据应用程序对内存的需求不断增加，这推动了服务器系统内存容量的急剧增长。为了最大限度地提高外部存储器的效率，人们提出了基于hw的存储器压缩技术来增加有效存储器容量。尽管这种内存压缩技术可以显著提高内存效率，但在基于hw的压缩技术中存在一个关键的权衡。由于需要尽可能快地解压缩内存块以处理缓存丢失，因此延迟优化技术在缓存粒度上应用压缩，从而实现少于几个周期的解压缩延迟。然而，这种延迟优化技术可能会失去容量优化技术潜在的高压缩比，后者使用更长的延迟算法压缩更大的内存块。考虑到内存压缩中的基本权衡，本文提出了一种透明双内存压缩(DMC)架构，该架构选择性地使用两种具有不同延迟和压缩特性的压缩算法。利用内存访问的局部性，提出的体系结构使用容量优化压缩算法压缩访问频率较低的块，同时使用延迟优化算法压缩最近访问的块。此外，本研究主张在未压缩的地址空间和压缩的物理空间之间进行基于hw的转换，而不是依靠虚拟内存系统的支持来定位压缩的内存块。这种操作系统透明的方法消除了压缩效率和为减少TLB丢失而采用的大页面支持之间的冲突。提出的压缩架构应用于混合记忆体(HMC)，在堆叠的dram下有一个逻辑层。实验结果表明，所提出的压缩体系结构比当前最先进的延迟优化技术提供了54%的压缩比，并且与没有压缩的基准系统相比没有性能下降。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

自引率

0.00%

发文量