Marco Bertuletti;Yichao Zhang;Alessandro Vanelli-Coralli;Luca Benini
{"title":"A 66-Gb/s/5.5-W RISC-V Many-Core Cluster for 5G+ Software-Defined Radio Uplinks","authors":"Marco Bertuletti;Yichao Zhang;Alessandro Vanelli-Coralli;Luca Benini","doi":"10.1109/TVLSI.2025.3576855","DOIUrl":null,"url":null,"abstract":"Following the scale-up of new radio (NR) complexity in 5G and beyond, the physical layer’s computing load on base stations is increasing under a strictly constrained latency and power budget; base stations must process <inline-formula> <tex-math>$\\gt$ </tex-math></inline-formula> 20-Gb/s uplink wireless data rate on the fly, in <inline-formula> <tex-math>$\\lt$ </tex-math></inline-formula> 10 W. At the same time, the programmability and reconfigurability of base station components are the key requirements; it reduces the time and cost of new networks’ deployment, it lowers the acceptance threshold for industry players to enter the market, and it ensures return on investments in a fast-paced evolution of standards. In this article, we present the design of a many-core cluster for 5G and beyond base station processing. Our design features 1024, streamlined RISC-V cores with domain-specific FP extensions, and 4-MiB shared memory. It provides the necessary computational capabilities for software-defined processing of the lower physical layer of 5G physical uplink shared channel (PUSCH), satisfying high-end throughput requirements (66 Gb/s for a transition time interval (TTI), 9.4–302 Gb/s depending on the processing stage). The throughput metrics for the implemented functions are ten times higher than in state-of-the-art (SoTA) application-specific instruction processors (ASIPs). The energy efficiency on key NR kernels (2–41 Gb/s/W), measured at 800 MHz, <inline-formula> <tex-math>${25}~^{\\circ } $ </tex-math></inline-formula>C, and 0.8 V, on a placed and routed instance in 12-nm CMOS technology, is competitive with SoTA architectures. The PUSCH processing runs end-to-end on a single cluster in 1.7 ms, at <6-W average power consumption, achieving 12 Gb/s/W.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2225-2238"},"PeriodicalIF":3.1000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11038837/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Following the scale-up of new radio (NR) complexity in 5G and beyond, the physical layer’s computing load on base stations is increasing under a strictly constrained latency and power budget; base stations must process $\gt$ 20-Gb/s uplink wireless data rate on the fly, in $\lt$ 10 W. At the same time, the programmability and reconfigurability of base station components are the key requirements; it reduces the time and cost of new networks’ deployment, it lowers the acceptance threshold for industry players to enter the market, and it ensures return on investments in a fast-paced evolution of standards. In this article, we present the design of a many-core cluster for 5G and beyond base station processing. Our design features 1024, streamlined RISC-V cores with domain-specific FP extensions, and 4-MiB shared memory. It provides the necessary computational capabilities for software-defined processing of the lower physical layer of 5G physical uplink shared channel (PUSCH), satisfying high-end throughput requirements (66 Gb/s for a transition time interval (TTI), 9.4–302 Gb/s depending on the processing stage). The throughput metrics for the implemented functions are ten times higher than in state-of-the-art (SoTA) application-specific instruction processors (ASIPs). The energy efficiency on key NR kernels (2–41 Gb/s/W), measured at 800 MHz, ${25}~^{\circ } $ C, and 0.8 V, on a placed and routed instance in 12-nm CMOS technology, is competitive with SoTA architectures. The PUSCH processing runs end-to-end on a single cluster in 1.7 ms, at <6-W average power consumption, achieving 12 Gb/s/W.
期刊介绍:
The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.