48核IA-32消息传递处理器，采用45纳米CMOS的DVFS

2010 IEEE International Solid-State Circuits Conference - (ISSCC) Pub Date : 2010-03-18 DOI:10.1109/ISSCC.2010.5434077

J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan, G. Ruhl, D. Jenkins, H. Wilson, N. Borkar, G. Schrom, Fabric Pailet, Shailendra Jain, Tiju Jacob, Satish Yada, Sraven Marella, Praveen Salihundam, V. Erraguntla, M. Konow, Michael Riepen, G. Droege, Joerg Lindemann, M. Gries, T. Apel, K. Henriss, Tor Lund-Larsen, Sebastian Steibl, S. Borkar, V. De, R. V. D. Wijngaart, T. Mattson

{"title":"48核IA-32消息传递处理器，采用45纳米CMOS的DVFS","authors":"J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan, G. Ruhl, D. Jenkins, H. Wilson, N. Borkar, G. Schrom, Fabric Pailet, Shailendra Jain, Tiju Jacob, Satish Yada, Sraven Marella, Praveen Salihundam, V. Erraguntla, M. Konow, Michael Riepen, G. Droege, Joerg Lindemann, M. Gries, T. Apel, K. Henriss, Tor Lund-Larsen, Sebastian Steibl, S. Borkar, V. De, R. V. D. Wijngaart, T. Mattson","doi":"10.1109/ISSCC.2010.5434077","DOIUrl":null,"url":null,"abstract":"Current developments in microprocessor design favor increased core counts over frequency scaling to improve processor performance and energy efficiency. Coupling this architectural trend with a message-passing protocol helps realize a data-center-on-a-die. The prototype chip (Figs. 5.7.1 and 5.7.7) described in this paper integrates 48 Pentium™ class IA-32 cores [1] on a 6×4 2D-mesh network of tiled core clusters with high-speed I/Os on the periphery. The chip contains 1.3B transistors. Each core has a private 256KB L2 cache (12MB total on-die) and is optimized to support a message-passing-programming model whereby cores communicate through shared memory. A 16KB message-passing buffer (MPB) is present in every tile, giving a total of 384KB on-die shared memory, for increased performance. Power is kept at a minimum by transmitting dynamic, fine-grained voltage-change commands over the network to an on-die voltage-regulator controller (VRC). Further power savings are achieved through active frequency scaling at the tile granularity. Memory accesses are distributed over four on-die DDR3 controllers for an aggregate peak memory bandwidth of 21GB/s at 4× burst. Additionally, an 8-byte bidirectional system interface (SIF) provides 6.4GB/s of I/O bandwidth. The die area is 567mm2 and is implemented in 45nm high-к metal-gate CMOS [2].","PeriodicalId":6418,"journal":{"name":"2010 IEEE International Solid-State Circuits Conference - (ISSCC)","volume":"45 1","pages":"108-109"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"708","resultStr":"{\"title\":\"A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS\",\"authors\":\"J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan, G. Ruhl, D. Jenkins, H. Wilson, N. Borkar, G. Schrom, Fabric Pailet, Shailendra Jain, Tiju Jacob, Satish Yada, Sraven Marella, Praveen Salihundam, V. Erraguntla, M. Konow, Michael Riepen, G. Droege, Joerg Lindemann, M. Gries, T. Apel, K. Henriss, Tor Lund-Larsen, Sebastian Steibl, S. Borkar, V. De, R. V. D. Wijngaart, T. Mattson\",\"doi\":\"10.1109/ISSCC.2010.5434077\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Current developments in microprocessor design favor increased core counts over frequency scaling to improve processor performance and energy efficiency. Coupling this architectural trend with a message-passing protocol helps realize a data-center-on-a-die. The prototype chip (Figs. 5.7.1 and 5.7.7) described in this paper integrates 48 Pentium™ class IA-32 cores [1] on a 6×4 2D-mesh network of tiled core clusters with high-speed I/Os on the periphery. The chip contains 1.3B transistors. Each core has a private 256KB L2 cache (12MB total on-die) and is optimized to support a message-passing-programming model whereby cores communicate through shared memory. A 16KB message-passing buffer (MPB) is present in every tile, giving a total of 384KB on-die shared memory, for increased performance. Power is kept at a minimum by transmitting dynamic, fine-grained voltage-change commands over the network to an on-die voltage-regulator controller (VRC). Further power savings are achieved through active frequency scaling at the tile granularity. Memory accesses are distributed over four on-die DDR3 controllers for an aggregate peak memory bandwidth of 21GB/s at 4× burst. Additionally, an 8-byte bidirectional system interface (SIF) provides 6.4GB/s of I/O bandwidth. The die area is 567mm2 and is implemented in 45nm high-к metal-gate CMOS [2].\",\"PeriodicalId\":6418,\"journal\":{\"name\":\"2010 IEEE International Solid-State Circuits Conference - (ISSCC)\",\"volume\":\"45 1\",\"pages\":\"108-109\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"708\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Solid-State Circuits Conference - (ISSCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSCC.2010.5434077\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Solid-State Circuits Conference - (ISSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSCC.2010.5434077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 708

摘要

当前微处理器设计的发展倾向于增加核数而不是频率缩放，以提高处理器性能和能源效率。将这种体系结构趋势与消息传递协议相结合有助于实现数据中心。本文描述的原型芯片(图5.7.1和5.7.7)将48个Pentium™类IA-32内核[1]集成在一个6×4二维网格网络上，该网络由平铺核心集群组成，外围是高速I/ o。该芯片包含13亿个晶体管。每个内核都有一个专用的256KB二级缓存(片内总共12MB)，并经过优化以支持消息传递编程模型，内核通过共享内存进行通信。每个块中都有一个16KB的消息传递缓冲区(MPB)，总共提供384KB的片上共享内存，以提高性能。通过将动态的、细粒度的电压变化命令通过网络传输到片上电压调节控制器(VRC)，功率保持在最低限度。进一步的节能是通过在瓷砖粒度上的主动频率缩放来实现的。内存访问分布在四个片上DDR3控制器上，在4x突发时，总峰值内存带宽为21GB/s。此外，一个8字节的双向系统接口(SIF)提供6.4GB/s的I/O带宽。该芯片面积为567mm2，采用45nm高通量金属栅CMOS[2]实现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS

Current developments in microprocessor design favor increased core counts over frequency scaling to improve processor performance and energy efficiency. Coupling this architectural trend with a message-passing protocol helps realize a data-center-on-a-die. The prototype chip (Figs. 5.7.1 and 5.7.7) described in this paper integrates 48 Pentium™ class IA-32 cores [1] on a 6×4 2D-mesh network of tiled core clusters with high-speed I/Os on the periphery. The chip contains 1.3B transistors. Each core has a private 256KB L2 cache (12MB total on-die) and is optimized to support a message-passing-programming model whereby cores communicate through shared memory. A 16KB message-passing buffer (MPB) is present in every tile, giving a total of 384KB on-die shared memory, for increased performance. Power is kept at a minimum by transmitting dynamic, fine-grained voltage-change commands over the network to an on-die voltage-regulator controller (VRC). Further power savings are achieved through active frequency scaling at the tile granularity. Memory accesses are distributed over four on-die DDR3 controllers for an aggregate peak memory bandwidth of 21GB/s at 4× burst. Additionally, an 8-byte bidirectional system interface (SIF) provides 6.4GB/s of I/O bandwidth. The die area is 567mm2 and is implemented in 45nm high-к metal-gate CMOS [2].

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 IEEE International Solid-State Circuits Conference - (ISSCC)

自引率

0.00%

发文量