MAC: Memory Access Coalescer for 3D-Stacked Memory

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI:10.1145/3337821.3337867

Xi Wang, Antonino Tumeo, John D. Leidel, Jie Li, Yong Chen

{"title":"MAC: Memory Access Coalescer for 3D-Stacked Memory","authors":"Xi Wang, Antonino Tumeo, John D. Leidel, Jie Li, Yong Chen","doi":"10.1145/3337821.3337867","DOIUrl":null,"url":null,"abstract":"Emerging data-intensive applications, such as graph analytics and data mining, exhibit irregular memory access patterns. Research has shown that with these memory-bound applications, traditional cache-based processor architectures, which exploit locality and regular patterns to mitigate the memory-wall issue, are inefficient. Meantime, novel 3D-stacked memory devices, such as Hybrid Memory Cube (HMC) and High Bandwidth Memory (HBM), promise significant increases in bandwidth that appear extremely appealing for memory-bound applications. However, conventional memory interfaces designed for cache-based architectures and JEDEC DDR devices fit poorly with the 3D-stacked memory, which leads to significant under-utilization of the promised high bandwidth. As a response to these issues, in this paper we propose MAC (Memory Access Coalescer), a coalescing unit for the 3D-stacked memory. We discuss the design and implementation of MAC, in the context of a custom designed cache-less architecture targeted at data-intensive, irregular applications. Through a custom simulation infrastructure based on the RISC-V toolchain, we show that MAC achieves a coalescing efficiency of 52.85% on average. It improves the performance of the memory system by 60.73% on average for a large set of irregular workloads.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 48th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3337821.3337867","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Emerging data-intensive applications, such as graph analytics and data mining, exhibit irregular memory access patterns. Research has shown that with these memory-bound applications, traditional cache-based processor architectures, which exploit locality and regular patterns to mitigate the memory-wall issue, are inefficient. Meantime, novel 3D-stacked memory devices, such as Hybrid Memory Cube (HMC) and High Bandwidth Memory (HBM), promise significant increases in bandwidth that appear extremely appealing for memory-bound applications. However, conventional memory interfaces designed for cache-based architectures and JEDEC DDR devices fit poorly with the 3D-stacked memory, which leads to significant under-utilization of the promised high bandwidth. As a response to these issues, in this paper we propose MAC (Memory Access Coalescer), a coalescing unit for the 3D-stacked memory. We discuss the design and implementation of MAC, in the context of a custom designed cache-less architecture targeted at data-intensive, irregular applications. Through a custom simulation infrastructure based on the RISC-V toolchain, we show that MAC achieves a coalescing efficiency of 52.85% on average. It improves the performance of the memory system by 60.73% on average for a large set of irregular workloads.

查看原文本刊更多论文

MAC:用于3d堆叠内存的内存访问聚合器

新兴的数据密集型应用程序，如图形分析和数据挖掘，呈现出不规则的内存访问模式。研究表明，对于这些内存绑定的应用程序，传统的基于缓存的处理器架构(利用局部性和规则模式来缓解内存墙问题)是低效的。与此同时，新型3d堆叠存储设备，如混合内存立方体(HMC)和高带宽内存(HBM)，有望显著提高带宽，这对于内存受限的应用程序来说非常有吸引力。然而，为基于缓存的架构和JEDEC DDR设备设计的传统内存接口不适合3d堆叠内存，这导致承诺的高带宽利用率严重不足。为了解决这些问题，本文提出了一种用于3d堆叠存储器的聚结单元MAC (Memory Access Coalescer)。我们讨论了MAC的设计和实现，在针对数据密集型、不规则应用程序的定制设计的无缓存架构的背景下。通过基于RISC-V工具链的自定义仿真基础设施，我们表明MAC实现了平均52.85%的聚并效率。对于大量不规律的工作负载，内存系统的性能平均提高60.73%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 48th International Conference on Parallel Processing

自引率

0.00%

发文量