SOUP-N-SALAD: Allocation-Oblivious Access Latency Reduction with Asymmetric DRAM Microarchitectures

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2017-02-01 DOI:10.1109/HPCA.2017.31

Yuhwan Ro, Hyunyoon Cho, Eojin Lee, Daejin Jung, Y. Son, Jung Ho Ahn, Jae W. Lee

{"title":"SOUP-N-SALAD: Allocation-Oblivious Access Latency Reduction with Asymmetric DRAM Microarchitectures","authors":"Yuhwan Ro, Hyunyoon Cho, Eojin Lee, Daejin Jung, Y. Son, Jung Ho Ahn, Jae W. Lee","doi":"10.1109/HPCA.2017.31","DOIUrl":null,"url":null,"abstract":"Memory access latency has a significant impact on application performance. Unfortunately, the random access latency of DRAM has been scaling relatively slowly, and often directly affects the critical path of execution, especially for applications with insufficient locality or memory-level parallelism. The existing low-latency DRAM organizations either incur significant area overhead or burden the software stack with non-uniform access latency. This paper proposes two microarchitectural techniques to provide uniformly low access time over the entire DRAM chip. The first technique is SALAD, a new DRAM device architecture that provides symmetric access latency with asymmetric DRAM bank organizations. Because local regions have lower data transfer time due to their proximity to the I/O pads, SALAD applies high aspect-ratio (i.e., low-latency) mats only to remote regions to offset the difference in data transfer time, resulting in symmetrically low latency across regions. The second technique is SOUP (skewed organization of µ banks with pipelined accesses), which leverages asymmetry in column access latency within a region due to non-uniform distance to the column decoders. By starting I/O transfers as soon as data from near cells arrive, instead of waiting for the entire column data, SOUP further saves two memory clock cycles for column accesses for all regions. The resulting design, called SOUP-N-SALAD, improves IPC and EDP by 9.6% (11.2%) and 18.2% (21.8%) over the baseline DDR4 device, respectively, for memory-intensive SPEC CPU2006 workloads without any software modifications, while incurring only 3% (6%) area overhead.","PeriodicalId":118950,"journal":{"name":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2017.31","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Memory access latency has a significant impact on application performance. Unfortunately, the random access latency of DRAM has been scaling relatively slowly, and often directly affects the critical path of execution, especially for applications with insufficient locality or memory-level parallelism. The existing low-latency DRAM organizations either incur significant area overhead or burden the software stack with non-uniform access latency. This paper proposes two microarchitectural techniques to provide uniformly low access time over the entire DRAM chip. The first technique is SALAD, a new DRAM device architecture that provides symmetric access latency with asymmetric DRAM bank organizations. Because local regions have lower data transfer time due to their proximity to the I/O pads, SALAD applies high aspect-ratio (i.e., low-latency) mats only to remote regions to offset the difference in data transfer time, resulting in symmetrically low latency across regions. The second technique is SOUP (skewed organization of µ banks with pipelined accesses), which leverages asymmetry in column access latency within a region due to non-uniform distance to the column decoders. By starting I/O transfers as soon as data from near cells arrive, instead of waiting for the entire column data, SOUP further saves two memory clock cycles for column accesses for all regions. The resulting design, called SOUP-N-SALAD, improves IPC and EDP by 9.6% (11.2%) and 18.2% (21.8%) over the baseline DDR4 device, respectively, for memory-intensive SPEC CPU2006 workloads without any software modifications, while incurring only 3% (6%) area overhead.

查看原文本刊更多论文

非对称DRAM微架构下的分配无关访问延迟降低

内存访问延迟对应用程序性能有重大影响。不幸的是，DRAM的随机访问延迟的扩展速度相对较慢，并且经常直接影响执行的关键路径，特别是对于局部性或内存级并行性不足的应用程序。现有的低延迟DRAM组织要么导致大量的面积开销，要么使软件堆栈负担不均匀的访问延迟。本文提出了两种微架构技术，以在整个DRAM芯片上提供统一的低访问时间。第一种技术是SALAD，这是一种新的DRAM设备架构，可以通过不对称的DRAM库组织提供对称的访问延迟。由于本地区域靠近I/O垫，因此数据传输时间较短，因此SALAD仅对远程区域应用高纵横比(即低延迟)垫来抵消数据传输时间的差异，从而导致跨区域对称的低延迟。第二种技术是SOUP(带有管道访问的微银行歪斜组织)，它利用了由于到列解码器的距离不均匀而导致的区域内列访问延迟的不对称性。通过在邻近单元的数据到达时立即开始I/O传输，而不是等待整个列数据，SOUP进一步为所有区域的列访问节省了两个内存时钟周期。由此产生的设计，称为SOUP-N-SALAD，在没有任何软件修改的情况下，对于内存密集型SPEC CPU2006工作负载，IPC和EDP分别比基线DDR4设备提高了9.6%(11.2%)和18.2%(21.8%)，同时只产生3%(6%)的面积开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量