PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor

ASPLOS XII Pub Date : 2006-10-23 DOI:10.1145/1168857.1168873
Taeho Kgil, Shaun C. D'Souza, A. Saidi, N. Binkert, R. Dreslinski, T. Mudge, S. Reinhardt, K. Flautner
{"title":"PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor","authors":"Taeho Kgil, Shaun C. D'Souza, A. Saidi, N. Binkert, R. Dreslinski, T. Mudge, S. Reinhardt, K. Flautner","doi":"10.1145/1168857.1168873","DOIUrl":null,"url":null,"abstract":"In this paper, we show how 3D stacking technology can be used to implement a simple, low-power, high-performance chip multiprocessor suitable for throughput processing. Our proposed architecture, PicoServer, employs 3D technology to bond one die containing several simple slow processing cores to multiple DRAM dies sufficient for a primary memory. The 3D technology also enables wide low-latency buses between processors and memory. These remove the need for an L2 cache allowing its area to be re-allocated to additional simple cores. The additional cores allow the clock frequency to be lowered without impairing throughput. Lower clock frequency in turn reduces power and means that thermal constraints, a concern with 3D stacking, are easily satisfied.The PicoServer architecture specifically targets Tier 1 server applications, which exhibit a high degree of thread level parallelism. An architecture targeted to efficient throughput is ideal for this application domain. We find for a similar logic die area, a 12 CPU system with 3D stacking and no L2 cache outperforms an 8 CPU system with a large on-chip L2 cache by about 14% while consuming 55% less power. In addition, we show that a PicoServer performs comparably to a Pentium 4-like class machine while consuming only about 1/10 of the power, even when conservative assumptions are made about the power consumption of the PicoServer.","PeriodicalId":270694,"journal":{"name":"ASPLOS XII","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"222","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ASPLOS XII","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1168857.1168873","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 222

Abstract

In this paper, we show how 3D stacking technology can be used to implement a simple, low-power, high-performance chip multiprocessor suitable for throughput processing. Our proposed architecture, PicoServer, employs 3D technology to bond one die containing several simple slow processing cores to multiple DRAM dies sufficient for a primary memory. The 3D technology also enables wide low-latency buses between processors and memory. These remove the need for an L2 cache allowing its area to be re-allocated to additional simple cores. The additional cores allow the clock frequency to be lowered without impairing throughput. Lower clock frequency in turn reduces power and means that thermal constraints, a concern with 3D stacking, are easily satisfied.The PicoServer architecture specifically targets Tier 1 server applications, which exhibit a high degree of thread level parallelism. An architecture targeted to efficient throughput is ideal for this application domain. We find for a similar logic die area, a 12 CPU system with 3D stacking and no L2 cache outperforms an 8 CPU system with a large on-chip L2 cache by about 14% while consuming 55% less power. In addition, we show that a PicoServer performs comparably to a Pentium 4-like class machine while consuming only about 1/10 of the power, even when conservative assumptions are made about the power consumption of the PicoServer.
PicoServer:采用3D堆叠技术,实现紧凑节能的芯片多处理器
在本文中,我们展示了如何使用3D堆叠技术来实现适合吞吐量处理的简单,低功耗,高性能的芯片多处理器。我们提出的架构PicoServer采用3D技术将一个包含几个简单慢速处理核心的芯片绑定到多个足以用于主存储器的DRAM芯片上。3D技术还支持处理器和内存之间的宽低延迟总线。这消除了对二级缓存的需求,允许将其区域重新分配给其他简单的内核。额外的核心允许降低时钟频率而不影响吞吐量。较低的时钟频率反过来又降低了功耗,这意味着3D堆叠的热限制很容易得到满足。PicoServer体系结构专门针对第1层服务器应用程序,它表现出高度的线程级并行性。针对高效吞吐量的体系结构是此应用程序领域的理想选择。我们发现,对于类似的逻辑芯片面积,具有3D堆叠且没有L2缓存的12个CPU系统比具有大型片上L2缓存的8个CPU系统性能好约14%,同时功耗降低55%。此外,我们表明,即使对PicoServer的功耗进行保守假设,PicoServer的性能也可以与Pentium 4类机器相媲美,而功耗仅为Pentium 4类机器的1/10左右。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信