分阶段读取:减轻DRAM写入对DRAM读取的影响

IEEE International Symposium on High-Performance Comp Architecture Pub Date : 2012-02-25 DOI:10.1109/HPCA.2012.6168943

Niladrish Chatterjee, Naveen Muralimanohar, R. Balasubramonian, A. Davis, N. Jouppi

{"title":"分阶段读取:减轻DRAM写入对DRAM读取的影响","authors":"Niladrish Chatterjee, Naveen Muralimanohar, R. Balasubramonian, A. Davis, N. Jouppi","doi":"10.1109/HPCA.2012.6168943","DOIUrl":null,"url":null,"abstract":"Main memory latencies have always been a concern for system performance. Given that reads are on the critical path for CPU progress, reads must be prioritized over writes. However, writes must be eventually processed and they often delay pending reads. In fact, a single channel in the main memory system offers almost no parallelism between reads and writes. This is because a single off-chip memory bus is shared by reads and writes and the direction of the bus has to be explicitly turned around when switching from writes to reads. This is an expensive operation and its cost is amortized by carrying out a burst of writes or reads every time the bus direction is switched. As a result, no reads can be processed while a memory channel is busy servicing writes. This paper proposes a novel mechanism to boost read-write parallelism and perform useful components of read operations even when the memory system is busy performing writes. If some of the banks are busy servicing writes, we start issuing reads to the other idle banks. The results of these reads are stored in a few registers near the memory chip's I/O pads. These results are quickly returned immediately following the bus turnaround. The process is referred to as a Staged Read because it decouples a single read operation into two stages, with the first step being performed in parallel with writes. This innovation can also be viewed as a form of prefetch that is internal to a memory chip. The proposed technique works best when there is bank imbalance in the write stream. We also introduce a write scheduling algorithm that artificially creates bank imbalance and allows useful read operations to be performed during the write drain. Across a suite of memory-intensive workloads, we show that Staged Reads can boost throughput by up to 33% (average 7%) with an average DRAM access latency improvement of 17%, while incurring a very small cost (0.25%) in terms of memory chip area. The throughput improvements are even greater when considering write-intensive workloads (average 11%) or future systems (average 12%).","PeriodicalId":380383,"journal":{"name":"IEEE International Symposium on High-Performance Comp Architecture","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"61","resultStr":"{\"title\":\"Staged Reads: Mitigating the impact of DRAM writes on DRAM reads\",\"authors\":\"Niladrish Chatterjee, Naveen Muralimanohar, R. Balasubramonian, A. Davis, N. Jouppi\",\"doi\":\"10.1109/HPCA.2012.6168943\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Main memory latencies have always been a concern for system performance. Given that reads are on the critical path for CPU progress, reads must be prioritized over writes. However, writes must be eventually processed and they often delay pending reads. In fact, a single channel in the main memory system offers almost no parallelism between reads and writes. This is because a single off-chip memory bus is shared by reads and writes and the direction of the bus has to be explicitly turned around when switching from writes to reads. This is an expensive operation and its cost is amortized by carrying out a burst of writes or reads every time the bus direction is switched. As a result, no reads can be processed while a memory channel is busy servicing writes. This paper proposes a novel mechanism to boost read-write parallelism and perform useful components of read operations even when the memory system is busy performing writes. If some of the banks are busy servicing writes, we start issuing reads to the other idle banks. The results of these reads are stored in a few registers near the memory chip's I/O pads. These results are quickly returned immediately following the bus turnaround. The process is referred to as a Staged Read because it decouples a single read operation into two stages, with the first step being performed in parallel with writes. This innovation can also be viewed as a form of prefetch that is internal to a memory chip. The proposed technique works best when there is bank imbalance in the write stream. We also introduce a write scheduling algorithm that artificially creates bank imbalance and allows useful read operations to be performed during the write drain. Across a suite of memory-intensive workloads, we show that Staged Reads can boost throughput by up to 33% (average 7%) with an average DRAM access latency improvement of 17%, while incurring a very small cost (0.25%) in terms of memory chip area. The throughput improvements are even greater when considering write-intensive workloads (average 11%) or future systems (average 12%).\",\"PeriodicalId\":380383,\"journal\":{\"name\":\"IEEE International Symposium on High-Performance Comp Architecture\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-02-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"61\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Symposium on High-Performance Comp Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2012.6168943\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Symposium on High-Performance Comp Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2012.6168943","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 61

摘要

主存延迟一直是系统性能的一个关注点。考虑到读在CPU进程的关键路径上，读必须优先于写。然而，写操作最终必须被处理，而且它们经常会延迟挂起的读操作。事实上，主存系统中的单个通道几乎没有提供读写之间的并行性。这是因为一个片外存储器总线是由读和写共享的，当从写切换到读时，总线的方向必须显式地反转。这是一个昂贵的操作，它的成本是通过每次总线方向切换时执行突发的写入或读取来平摊的。因此，当内存通道忙于服务写操作时，无法处理任何读操作。本文提出了一种新的机制来提高读写并行性，并在内存系统忙于写操作时执行读操作的有用组件。如果一些银行忙于处理写操作，我们开始向其他空闲银行发出读操作。这些读取的结果存储在内存芯片的I/O盘附近的几个寄存器中。这些结果会在公共汽车转弯后立即返回。这个过程被称为阶段性读，因为它将单个读操作解耦为两个阶段，第一步与写操作并行执行。这种创新也可以被看作是存储芯片内部的一种预取形式。当写流中存在银行不平衡时，所提出的技术效果最好。我们还引入了一种写调度算法，该算法人为地造成银行不平衡，并允许在写耗尽期间执行有用的读操作。在一组内存密集型工作负载中，我们发现分阶段读取可以将吞吐量提高33%(平均7%)，平均DRAM访问延迟提高17%，同时在内存芯片面积方面产生非常小的成本(0.25%)。在考虑写密集型工作负载(平均11%)或未来系统(平均12%)时，吞吐量的改进甚至更大。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Staged Reads: Mitigating the impact of DRAM writes on DRAM reads

Main memory latencies have always been a concern for system performance. Given that reads are on the critical path for CPU progress, reads must be prioritized over writes. However, writes must be eventually processed and they often delay pending reads. In fact, a single channel in the main memory system offers almost no parallelism between reads and writes. This is because a single off-chip memory bus is shared by reads and writes and the direction of the bus has to be explicitly turned around when switching from writes to reads. This is an expensive operation and its cost is amortized by carrying out a burst of writes or reads every time the bus direction is switched. As a result, no reads can be processed while a memory channel is busy servicing writes. This paper proposes a novel mechanism to boost read-write parallelism and perform useful components of read operations even when the memory system is busy performing writes. If some of the banks are busy servicing writes, we start issuing reads to the other idle banks. The results of these reads are stored in a few registers near the memory chip's I/O pads. These results are quickly returned immediately following the bus turnaround. The process is referred to as a Staged Read because it decouples a single read operation into two stages, with the first step being performed in parallel with writes. This innovation can also be viewed as a form of prefetch that is internal to a memory chip. The proposed technique works best when there is bank imbalance in the write stream. We also introduce a write scheduling algorithm that artificially creates bank imbalance and allows useful read operations to be performed during the write drain. Across a suite of memory-intensive workloads, we show that Staged Reads can boost throughput by up to 33% (average 7%) with an average DRAM access latency improvement of 17%, while incurring a very small cost (0.25%) in terms of memory chip area. The throughput improvements are even greater when considering write-intensive workloads (average 11%) or future systems (average 12%).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE International Symposium on High-Performance Comp Architecture

自引率

0.00%

发文量