SOUP-N-SALAD: Allocation-Oblivious Access Latency Reduction with Asymmetric DRAM Microarchitectures

Yuhwan Ro, Hyunyoon Cho, Eojin Lee, Daejin Jung, Y. Son, Jung Ho Ahn, Jae W. Lee
{"title":"SOUP-N-SALAD: Allocation-Oblivious Access Latency Reduction with Asymmetric DRAM Microarchitectures","authors":"Yuhwan Ro, Hyunyoon Cho, Eojin Lee, Daejin Jung, Y. Son, Jung Ho Ahn, Jae W. Lee","doi":"10.1109/HPCA.2017.31","DOIUrl":null,"url":null,"abstract":"Memory access latency has a significant impact on application performance. Unfortunately, the random access latency of DRAM has been scaling relatively slowly, and often directly affects the critical path of execution, especially for applications with insufficient locality or memory-level parallelism. The existing low-latency DRAM organizations either incur significant area overhead or burden the software stack with non-uniform access latency. This paper proposes two microarchitectural techniques to provide uniformly low access time over the entire DRAM chip. The first technique is SALAD, a new DRAM device architecture that provides symmetric access latency with asymmetric DRAM bank organizations. Because local regions have lower data transfer time due to their proximity to the I/O pads, SALAD applies high aspect-ratio (i.e., low-latency) mats only to remote regions to offset the difference in data transfer time, resulting in symmetrically low latency across regions. The second technique is SOUP (skewed organization of µ banks with pipelined accesses), which leverages asymmetry in column access latency within a region due to non-uniform distance to the column decoders. By starting I/O transfers as soon as data from near cells arrive, instead of waiting for the entire column data, SOUP further saves two memory clock cycles for column accesses for all regions. The resulting design, called SOUP-N-SALAD, improves IPC and EDP by 9.6% (11.2%) and 18.2% (21.8%) over the baseline DDR4 device, respectively, for memory-intensive SPEC CPU2006 workloads without any software modifications, while incurring only 3% (6%) area overhead.","PeriodicalId":118950,"journal":{"name":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2017.31","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Memory access latency has a significant impact on application performance. Unfortunately, the random access latency of DRAM has been scaling relatively slowly, and often directly affects the critical path of execution, especially for applications with insufficient locality or memory-level parallelism. The existing low-latency DRAM organizations either incur significant area overhead or burden the software stack with non-uniform access latency. This paper proposes two microarchitectural techniques to provide uniformly low access time over the entire DRAM chip. The first technique is SALAD, a new DRAM device architecture that provides symmetric access latency with asymmetric DRAM bank organizations. Because local regions have lower data transfer time due to their proximity to the I/O pads, SALAD applies high aspect-ratio (i.e., low-latency) mats only to remote regions to offset the difference in data transfer time, resulting in symmetrically low latency across regions. The second technique is SOUP (skewed organization of µ banks with pipelined accesses), which leverages asymmetry in column access latency within a region due to non-uniform distance to the column decoders. By starting I/O transfers as soon as data from near cells arrive, instead of waiting for the entire column data, SOUP further saves two memory clock cycles for column accesses for all regions. The resulting design, called SOUP-N-SALAD, improves IPC and EDP by 9.6% (11.2%) and 18.2% (21.8%) over the baseline DDR4 device, respectively, for memory-intensive SPEC CPU2006 workloads without any software modifications, while incurring only 3% (6%) area overhead.
非对称DRAM微架构下的分配无关访问延迟降低
内存访问延迟对应用程序性能有重大影响。不幸的是,DRAM的随机访问延迟的扩展速度相对较慢,并且经常直接影响执行的关键路径,特别是对于局部性或内存级并行性不足的应用程序。现有的低延迟DRAM组织要么导致大量的面积开销,要么使软件堆栈负担不均匀的访问延迟。本文提出了两种微架构技术,以在整个DRAM芯片上提供统一的低访问时间。第一种技术是SALAD,这是一种新的DRAM设备架构,可以通过不对称的DRAM库组织提供对称的访问延迟。由于本地区域靠近I/O垫,因此数据传输时间较短,因此SALAD仅对远程区域应用高纵横比(即低延迟)垫来抵消数据传输时间的差异,从而导致跨区域对称的低延迟。第二种技术是SOUP(带有管道访问的微银行歪斜组织),它利用了由于到列解码器的距离不均匀而导致的区域内列访问延迟的不对称性。通过在邻近单元的数据到达时立即开始I/O传输,而不是等待整个列数据,SOUP进一步为所有区域的列访问节省了两个内存时钟周期。由此产生的设计,称为SOUP-N-SALAD,在没有任何软件修改的情况下,对于内存密集型SPEC CPU2006工作负载,IPC和EDP分别比基线DDR4设备提高了9.6%(11.2%)和18.2%(21.8%),同时只产生3%(6%)的面积开销。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信