A Hierarchical 3-D Physical Design Method for Ultralarge-Scale Logic-on-Memory CGRA Chip

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-02-19 DOI:10.1109/TVLSI.2025.3538883

Zizheng Dong;Shuaipeng Li;Weijia Zhu;Ang Li;Qin Wang;Naifeng Jing;Weiguang Sheng;Jianfei Jiang;Zhigang Mao

{"title":"A Hierarchical 3-D Physical Design Method for Ultralarge-Scale Logic-on-Memory CGRA Chip","authors":"Zizheng Dong;Shuaipeng Li;Weijia Zhu;Ang Li;Qin Wang;Naifeng Jing;Weiguang Sheng;Jianfei Jiang;Zhigang Mao","doi":"10.1109/TVLSI.2025.3538883","DOIUrl":null,"url":null,"abstract":"Face-to-face bonded 3-D (F2F 3D) technology, with the potential to significantly reduce chip area while enhancing performance, stands as one of the most promising ways to extend Moore’s Law. However, current 3-D physical design flows are often modifications of 2-D design flows and rely on technical personnel to manually modify technical files. Furthermore, existing research on 3-D design flow primarily focuses on module implementation, with very few studies addressing hierarchical design methods for large-scale chips. In this article, we first introduce a 3-D physical design flow which concurrently optimizes the timing of both the logic tier and the memory tier, achieving synchronized physical design for both tiers. Then, we develop a bottom-up hierarchical 3-D physical design flow to extend the 3-D design flow to large-scale chip design. Through coordinated power planning, clock tree design, and interconnect unit design, we enhance the power, performance, and area (PPA) metrics of the entire chip. Using our RTL-to-GDS physical design flow, we successfully implemented a 28-nm CMOS logic-on-memory (LoM) 3-D coarse-grained reconfigurable architecture (CGRA) chip with over 50 million gates. Experimental results demonstrate that our 3-D flow improves timing by 16.1% while reducing voltage drop by 38.6% compared to the 2-D design. In addition, the power-delay product (PDP) of the 3-D chip decreases by 10.2%, showcasing better performance.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"1502-1515"},"PeriodicalIF":2.8000,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10893709/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Face-to-face bonded 3-D (F2F 3D) technology, with the potential to significantly reduce chip area while enhancing performance, stands as one of the most promising ways to extend Moore’s Law. However, current 3-D physical design flows are often modifications of 2-D design flows and rely on technical personnel to manually modify technical files. Furthermore, existing research on 3-D design flow primarily focuses on module implementation, with very few studies addressing hierarchical design methods for large-scale chips. In this article, we first introduce a 3-D physical design flow which concurrently optimizes the timing of both the logic tier and the memory tier, achieving synchronized physical design for both tiers. Then, we develop a bottom-up hierarchical 3-D physical design flow to extend the 3-D design flow to large-scale chip design. Through coordinated power planning, clock tree design, and interconnect unit design, we enhance the power, performance, and area (PPA) metrics of the entire chip. Using our RTL-to-GDS physical design flow, we successfully implemented a 28-nm CMOS logic-on-memory (LoM) 3-D coarse-grained reconfigurable architecture (CGRA) chip with over 50 million gates. Experimental results demonstrate that our 3-D flow improves timing by 16.1% while reducing voltage drop by 38.6% compared to the 2-D design. In addition, the power-delay product (PDP) of the 3-D chip decreases by 10.2%, showcasing better performance.

查看原文本刊更多论文

超大规模内存逻辑CGRA芯片的分层三维物理设计方法

面对面键合3D （F2F 3D）技术具有显著减小芯片面积、提高性能的潜力，是扩展摩尔定律最有前途的方法之一。然而，目前的三维物理设计流程往往是对二维设计流程的修改，依靠技术人员手工修改技术文件。此外，现有的三维设计流程研究主要集中在模块实现上，很少有针对大规模芯片的分层设计方法的研究。在本文中，我们首先介绍一个3-D物理设计流程，该流程可以同时优化逻辑层和内存层的时间，从而实现两层的同步物理设计。然后，我们开发了一个自下而上的分层三维物理设计流程，将三维设计流程扩展到大规模芯片设计。通过协调电源规划、时钟树设计和互连单元设计，我们提高了整个芯片的功耗、性能和面积（PPA）指标。利用我们的RTL-to-GDS物理设计流程，我们成功实现了具有超过5000万个门的28纳米CMOS逻辑内存（LoM） 3-D粗粒度可重构架构（CGRA）芯片。实验结果表明，与二维设计相比，我们的三维设计使时序提高了16.1%，电压降降低了38.6%。此外，3d芯片的功率延迟积（PDP）降低了10.2%，表现出更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Very Large Scale Integration (VLSI) Systems 工程技术-工程：电子与电气

CiteScore

6.40

自引率

7.10%

发文量

187

审稿时长

3.6 months

期刊介绍： The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society. Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels. To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.