在DRAM中利用子阵列级并行性(SALP)的一个案例

2012 39th Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2012-06-09 DOI:10.1145/2366231.2337202

Yoongu Kim, V. Seshadri, Donghyuk Lee, Jamie Liu, O. Mutlu

{"title":"在DRAM中利用子阵列级并行性(SALP)的一个案例","authors":"Yoongu Kim, V. Seshadri, Donghyuk Lee, Jamie Liu, O. Mutlu","doi":"10.1145/2366231.2337202","DOIUrl":null,"url":null,"abstract":"Modern DRAMs have multiple banks to serve multiple memory requests in parallel. However, when two requests go to the same bank, they have to be served serially, exacerbating the high latency of off-chip memory. Adding more banks to the system to mitigate this problem incurs high system cost. Our goal in this work is to achieve the benefits of increasing the number of banks with a low cost approach. To this end, we propose three new mechanisms that overlap the latencies of different requests that go to the same bank. The key observation exploited by our mechanisms is that a modern DRAM bank is implemented as a collection of subarrays that operate largely independently while sharing few global peripheral structures. Our proposed mechanisms (SALP-1, SALP-2, and MASA) mitigate the negative impact of bank serialization by overlapping different components of the bank access latencies of multiple requests that go to different subarrays within the same bank. SALP-1 requires no changes to the existing DRAM structure and only needs reinterpretation of some DRAM timing parameters. SALP-2 and MASA require only modest changes (<;0.15% area overhead) to the DRAM peripheral structures, which are much less design constrained than the DRAM core. Evaluations show that all our schemes significantly improve performance for both single-core systems and multi-core systems. Our schemes also interact positively with application-aware memory request scheduling in multi-core systems.","PeriodicalId":193578,"journal":{"name":"2012 39th Annual International Symposium on Computer Architecture (ISCA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"331","resultStr":"{\"title\":\"A case for exploiting subarray-level parallelism (SALP) in DRAM\",\"authors\":\"Yoongu Kim, V. Seshadri, Donghyuk Lee, Jamie Liu, O. Mutlu\",\"doi\":\"10.1145/2366231.2337202\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern DRAMs have multiple banks to serve multiple memory requests in parallel. However, when two requests go to the same bank, they have to be served serially, exacerbating the high latency of off-chip memory. Adding more banks to the system to mitigate this problem incurs high system cost. Our goal in this work is to achieve the benefits of increasing the number of banks with a low cost approach. To this end, we propose three new mechanisms that overlap the latencies of different requests that go to the same bank. The key observation exploited by our mechanisms is that a modern DRAM bank is implemented as a collection of subarrays that operate largely independently while sharing few global peripheral structures. Our proposed mechanisms (SALP-1, SALP-2, and MASA) mitigate the negative impact of bank serialization by overlapping different components of the bank access latencies of multiple requests that go to different subarrays within the same bank. SALP-1 requires no changes to the existing DRAM structure and only needs reinterpretation of some DRAM timing parameters. SALP-2 and MASA require only modest changes (<;0.15% area overhead) to the DRAM peripheral structures, which are much less design constrained than the DRAM core. Evaluations show that all our schemes significantly improve performance for both single-core systems and multi-core systems. Our schemes also interact positively with application-aware memory request scheduling in multi-core systems.\",\"PeriodicalId\":193578,\"journal\":{\"name\":\"2012 39th Annual International Symposium on Computer Architecture (ISCA)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-06-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"331\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 39th Annual International Symposium on Computer Architecture (ISCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2366231.2337202\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 39th Annual International Symposium on Computer Architecture (ISCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2366231.2337202","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 331

摘要

现代dram有多个银行来并行处理多个内存请求。但是，当两个请求发送到同一银行时，它们必须被串行地服务，这加剧了片外内存的高延迟。在系统中增加更多的银行以缓解这一问题，会带来很高的系统成本。我们在这项工作中的目标是以低成本的方式实现增加银行数量的好处。为此，我们提出了三种新的机制来重叠发送到同一银行的不同请求的延迟。我们的机制利用的关键观察结果是，现代DRAM库是作为子阵列的集合实现的，这些子阵列在很大程度上独立运行，同时共享很少的全局外围结构。我们提出的机制(SALP-1、SALP-2和MASA)通过重叠多个请求的银行访问延迟的不同组件来减轻银行序列化的负面影响，这些请求将发送到同一银行内的不同子数组。SALP-1不需要改变现有的DRAM结构，只需要重新解释一些DRAM时序参数。SALP-2和MASA只需要对DRAM外围结构进行适度的改变(< 0.15%的面积开销)，这比DRAM核心的设计约束要少得多。评估表明，我们所有的方案都显著提高了单核系统和多核系统的性能。我们的方案还与多核系统中应用程序感知的内存请求调度积极交互。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A case for exploiting subarray-level parallelism (SALP) in DRAM

Modern DRAMs have multiple banks to serve multiple memory requests in parallel. However, when two requests go to the same bank, they have to be served serially, exacerbating the high latency of off-chip memory. Adding more banks to the system to mitigate this problem incurs high system cost. Our goal in this work is to achieve the benefits of increasing the number of banks with a low cost approach. To this end, we propose three new mechanisms that overlap the latencies of different requests that go to the same bank. The key observation exploited by our mechanisms is that a modern DRAM bank is implemented as a collection of subarrays that operate largely independently while sharing few global peripheral structures. Our proposed mechanisms (SALP-1, SALP-2, and MASA) mitigate the negative impact of bank serialization by overlapping different components of the bank access latencies of multiple requests that go to different subarrays within the same bank. SALP-1 requires no changes to the existing DRAM structure and only needs reinterpretation of some DRAM timing parameters. SALP-2 and MASA require only modest changes (<;0.15% area overhead) to the DRAM peripheral structures, which are much less design constrained than the DRAM core. Evaluations show that all our schemes significantly improve performance for both single-core systems and multi-core systems. Our schemes also interact positively with application-aware memory request scheduling in multi-core systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 39th Annual International Symposium on Computer Architecture (ISCA)

自引率

0.00%

发文量