{"title":"互连主导多核3D-IC的带宽-延迟-热协同优化","authors":"Sudipta Das;Samuel Riedel;Mohamed Naeim;Moritz Brunion;Marco Bertuletti;Luca Benini;Julien Ryckaert;James Myers;Dwaipayan Biswas;Dragomir Milojevic","doi":"10.1109/TVLSI.2024.3467148","DOIUrl":null,"url":null,"abstract":"The ongoing integration of advanced functionalities in contemporary system-on-chips (SoCs) poses significant challenges related to memory bandwidth, capacity, and thermal stability. These challenges are further amplified with the advancement of artificial intelligence (AI), necessitating enhanced memory and interconnect bandwidth and latency. This article presents a comprehensive study encompassing architectural modifications of an interconnect-dominated many-core SoC targeting the significant increase of intermediate, on-chip cache memory bandwidth and access latency tuning. The proposed SoC has been implemented in 3-D using A10 nanosheet technology and early thermal analysis has been performed. Our workload simulations reveal, respectively, up to 12- and 2.5-fold acceleration in the 64-core and 16-core versions of the SoC. Such speed-up comes at 40% increase in die-area and a 60% rise in power dissipation when implemented in 2-D. In contrast, the 3-D counterpart not only minimizes the footprint but also yields 20% power savings, attributable to a 40% reduction in wirelength. The article further highlights the importance of pipeline restructuring to leverage the potential of 3-D technology for achieving lower latency and more efficient memory access. Finally, we discuss the thermal implications of various 3-D partitioning schemes in High Performance Computing (HPC) and mobile applications. Our analysis reveals that, unlike high-power density HPC cases, 3-D mobile case increases <inline-formula> <tex-math>$T_{\\max }$ </tex-math></inline-formula> only by <inline-formula> <tex-math>$2~^{\\circ } $ </tex-math></inline-formula>C–<inline-formula> <tex-math>$3~^{\\circ } $ </tex-math></inline-formula>C compared to 2-D, while the HPC scenario analysis requires multiconstrained efficient partitioning for 3-D implementations.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"346-357"},"PeriodicalIF":2.8000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bandwidth-Latency-Thermal Co-Optimization of Interconnect-Dominated Many-Core 3D-IC\",\"authors\":\"Sudipta Das;Samuel Riedel;Mohamed Naeim;Moritz Brunion;Marco Bertuletti;Luca Benini;Julien Ryckaert;James Myers;Dwaipayan Biswas;Dragomir Milojevic\",\"doi\":\"10.1109/TVLSI.2024.3467148\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The ongoing integration of advanced functionalities in contemporary system-on-chips (SoCs) poses significant challenges related to memory bandwidth, capacity, and thermal stability. These challenges are further amplified with the advancement of artificial intelligence (AI), necessitating enhanced memory and interconnect bandwidth and latency. This article presents a comprehensive study encompassing architectural modifications of an interconnect-dominated many-core SoC targeting the significant increase of intermediate, on-chip cache memory bandwidth and access latency tuning. The proposed SoC has been implemented in 3-D using A10 nanosheet technology and early thermal analysis has been performed. Our workload simulations reveal, respectively, up to 12- and 2.5-fold acceleration in the 64-core and 16-core versions of the SoC. Such speed-up comes at 40% increase in die-area and a 60% rise in power dissipation when implemented in 2-D. In contrast, the 3-D counterpart not only minimizes the footprint but also yields 20% power savings, attributable to a 40% reduction in wirelength. The article further highlights the importance of pipeline restructuring to leverage the potential of 3-D technology for achieving lower latency and more efficient memory access. Finally, we discuss the thermal implications of various 3-D partitioning schemes in High Performance Computing (HPC) and mobile applications. Our analysis reveals that, unlike high-power density HPC cases, 3-D mobile case increases <inline-formula> <tex-math>$T_{\\\\max }$ </tex-math></inline-formula> only by <inline-formula> <tex-math>$2~^{\\\\circ } $ </tex-math></inline-formula>C–<inline-formula> <tex-math>$3~^{\\\\circ } $ </tex-math></inline-formula>C compared to 2-D, while the HPC scenario analysis requires multiconstrained efficient partitioning for 3-D implementations.\",\"PeriodicalId\":13425,\"journal\":{\"name\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"volume\":\"33 2\",\"pages\":\"346-357\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10720515/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10720515/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
摘要
现代系统芯片(soc)不断集成高级功能,对内存带宽、容量和热稳定性提出了重大挑战。随着人工智能(AI)的进步,这些挑战进一步放大,需要增强内存、互连带宽和延迟。本文提出了一项全面的研究,包括互连主导的多核SoC的架构修改,目标是显著增加中间,片上缓存存储器带宽和访问延迟调优。所提出的SoC已经使用A10纳米片技术实现了3d,并进行了早期热分析。我们的工作负载模拟显示,64核和16核版本的SoC分别具有高达12倍和2.5倍的加速。当在2d中实现时,这种加速带来了40%的模面积增加和60%的功耗增加。相比之下,3d打印不仅可以最大限度地减少占地面积,还可以节省20%的电力,因为无线长度减少了40%。本文进一步强调了管道重组的重要性,以利用3d技术的潜力来实现更低的延迟和更高效的内存访问。最后,我们讨论了各种三维分区方案在高性能计算(HPC)和移动应用中的热影响。我们的分析表明,与高功率密度HPC情况不同,3- d移动情况下,与2- d相比,$T_{\max}$仅增加$2~^{\circ}$ C - $3~^{\circ}$ C,而HPC场景分析需要多约束的高效分区来实现3- d实现。
Bandwidth-Latency-Thermal Co-Optimization of Interconnect-Dominated Many-Core 3D-IC
The ongoing integration of advanced functionalities in contemporary system-on-chips (SoCs) poses significant challenges related to memory bandwidth, capacity, and thermal stability. These challenges are further amplified with the advancement of artificial intelligence (AI), necessitating enhanced memory and interconnect bandwidth and latency. This article presents a comprehensive study encompassing architectural modifications of an interconnect-dominated many-core SoC targeting the significant increase of intermediate, on-chip cache memory bandwidth and access latency tuning. The proposed SoC has been implemented in 3-D using A10 nanosheet technology and early thermal analysis has been performed. Our workload simulations reveal, respectively, up to 12- and 2.5-fold acceleration in the 64-core and 16-core versions of the SoC. Such speed-up comes at 40% increase in die-area and a 60% rise in power dissipation when implemented in 2-D. In contrast, the 3-D counterpart not only minimizes the footprint but also yields 20% power savings, attributable to a 40% reduction in wirelength. The article further highlights the importance of pipeline restructuring to leverage the potential of 3-D technology for achieving lower latency and more efficient memory access. Finally, we discuss the thermal implications of various 3-D partitioning schemes in High Performance Computing (HPC) and mobile applications. Our analysis reveals that, unlike high-power density HPC cases, 3-D mobile case increases $T_{\max }$ only by $2~^{\circ } $ C–$3~^{\circ } $ C compared to 2-D, while the HPC scenario analysis requires multiconstrained efficient partitioning for 3-D implementations.
期刊介绍:
The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.