通过利用过多的高密度TSV带宽，优化了3d堆叠内存架构

HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture Pub Date : 2010-04-01 DOI:10.1109/HPCA.2010.5416628

Dong Hyuk Woo, N. Seong, D. L. Lewis, H. Lee

{"title":"通过利用过多的高密度TSV带宽，优化了3d堆叠内存架构","authors":"Dong Hyuk Woo, N. Seong, D. L. Lewis, H. Lee","doi":"10.1109/HPCA.2010.5416628","DOIUrl":null,"url":null,"abstract":"Memory bandwidth has become a major performance bottleneck as more and more cores are integrated onto a single die, demanding more and more data from the system memory. Several prior studies have demonstrated that this memory bandwidth problem can be addressed by employing a 3D-stacked memory architecture, which provides a wide, high frequency memory-bus interface. Although previous 3D proposals already provide as much bandwidth as a traditional L2 cache can consume, the dense through-silicon-vias (TSVs) of 3D chip stacks can provide still more bandwidth. In this paper, we contest that we need to re-architect our memory hierarchy, including the L2 cache and DRAM interface, so that it can take full advantage of this massive bandwidth. Our technique, SMART-3D, is a new 3D-stacked memory architecture with a vertical L2 fetch/write-back network using a large array of TSVs. Simply stated, we leverage the TSV bandwidth to hide latency behind very large data transfers. We analyze the design trade-offs for the DRAM arrays, careful enough to avoid compromising the DRAM density because of TSV placement. Moreover, we propose an efficient mechanism to manage the false sharing problem when implementing SMART-3D in a multi-socket system. For single-threaded memory-intensive applications, the SMART-3D architecture achieves speedups from 1.53 to 2.14 over planar designs and from 1.27 to 1.72 over prior 3D designs. We achieve similar speedups for multi-program and multi-threaded workloads on multi-core and multi-socket processors. Furthermore, SMART-3D can even lower the energy consumption in the L2 cache and 3D DRAM for it reduces the total number of row buffer misses.","PeriodicalId":368621,"journal":{"name":"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"255","resultStr":"{\"title\":\"An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth\",\"authors\":\"Dong Hyuk Woo, N. Seong, D. L. Lewis, H. Lee\",\"doi\":\"10.1109/HPCA.2010.5416628\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Memory bandwidth has become a major performance bottleneck as more and more cores are integrated onto a single die, demanding more and more data from the system memory. Several prior studies have demonstrated that this memory bandwidth problem can be addressed by employing a 3D-stacked memory architecture, which provides a wide, high frequency memory-bus interface. Although previous 3D proposals already provide as much bandwidth as a traditional L2 cache can consume, the dense through-silicon-vias (TSVs) of 3D chip stacks can provide still more bandwidth. In this paper, we contest that we need to re-architect our memory hierarchy, including the L2 cache and DRAM interface, so that it can take full advantage of this massive bandwidth. Our technique, SMART-3D, is a new 3D-stacked memory architecture with a vertical L2 fetch/write-back network using a large array of TSVs. Simply stated, we leverage the TSV bandwidth to hide latency behind very large data transfers. We analyze the design trade-offs for the DRAM arrays, careful enough to avoid compromising the DRAM density because of TSV placement. Moreover, we propose an efficient mechanism to manage the false sharing problem when implementing SMART-3D in a multi-socket system. For single-threaded memory-intensive applications, the SMART-3D architecture achieves speedups from 1.53 to 2.14 over planar designs and from 1.27 to 1.72 over prior 3D designs. We achieve similar speedups for multi-program and multi-threaded workloads on multi-core and multi-socket processors. Furthermore, SMART-3D can even lower the energy consumption in the L2 cache and 3D DRAM for it reduces the total number of row buffer misses.\",\"PeriodicalId\":368621,\"journal\":{\"name\":\"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"255\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2010.5416628\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2010.5416628","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 255

摘要

随着越来越多的核心集成到单个芯片上，对系统内存的要求越来越高，内存带宽已经成为主要的性能瓶颈。先前的一些研究已经证明，这种内存带宽问题可以通过采用3d堆叠内存架构来解决，该架构提供了一个宽、高频的内存总线接口。虽然之前的3D方案已经提供了传统L2缓存所能消耗的带宽，但密集的硅通孔(tsv) 3D芯片堆栈可以提供更多的带宽。在本文中，我们认为我们需要重新构建我们的内存层次结构，包括L2缓存和DRAM接口，以便它可以充分利用这个巨大的带宽。我们的技术SMART-3D是一种新的3d堆叠内存架构，使用大量tsv阵列，具有垂直L2读取/回写网络。简单地说，我们利用TSV带宽来隐藏非常大的数据传输背后的延迟。我们分析了DRAM阵列的设计权衡，以避免由于TSV放置而影响DRAM密度。此外，我们提出了一种有效的机制来管理在多插座系统中实现SMART-3D时的错误共享问题。对于单线程内存密集型应用程序，SMART-3D架构比平面设计的速度提高了1.53到2.14，比以前的3D设计的速度提高了1.27到1.72。我们在多核和多套接字处理器上实现了类似的多程序和多线程工作负载加速。此外，SMART-3D甚至可以降低L2缓存和3D DRAM的能耗，因为它减少了行缓冲区丢失的总数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth

Memory bandwidth has become a major performance bottleneck as more and more cores are integrated onto a single die, demanding more and more data from the system memory. Several prior studies have demonstrated that this memory bandwidth problem can be addressed by employing a 3D-stacked memory architecture, which provides a wide, high frequency memory-bus interface. Although previous 3D proposals already provide as much bandwidth as a traditional L2 cache can consume, the dense through-silicon-vias (TSVs) of 3D chip stacks can provide still more bandwidth. In this paper, we contest that we need to re-architect our memory hierarchy, including the L2 cache and DRAM interface, so that it can take full advantage of this massive bandwidth. Our technique, SMART-3D, is a new 3D-stacked memory architecture with a vertical L2 fetch/write-back network using a large array of TSVs. Simply stated, we leverage the TSV bandwidth to hide latency behind very large data transfers. We analyze the design trade-offs for the DRAM arrays, careful enough to avoid compromising the DRAM density because of TSV placement. Moreover, we propose an efficient mechanism to manage the false sharing problem when implementing SMART-3D in a multi-socket system. For single-threaded memory-intensive applications, the SMART-3D architecture achieves speedups from 1.53 to 2.14 over planar designs and from 1.27 to 1.72 over prior 3D designs. We achieve similar speedups for multi-program and multi-threaded workloads on multi-core and multi-socket processors. Furthermore, SMART-3D can even lower the energy consumption in the L2 cache and 3D DRAM for it reduces the total number of row buffer misses.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture

自引率

0.00%

发文量