IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献

筛选
英文 中文
A Chiplet Platform for Intelligent Radar/Sonar Leveraging Domain-Specific Reusable Active Interposer 基于特定领域可重用主动中介器的智能雷达/声纳芯片平台
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-01-23 DOI: 10.1109/TVLSI.2025.3529699
Yafei Liu;Dejian Li;Zheng Yang;Chaoqin Zhang;Yunlai Zhang;Xiangyu Li;Mingwei Cao;Shouyi Yin
{"title":"A Chiplet Platform for Intelligent Radar/Sonar Leveraging Domain-Specific Reusable Active Interposer","authors":"Yafei Liu;Dejian Li;Zheng Yang;Chaoqin Zhang;Yunlai Zhang;Xiangyu Li;Mingwei Cao;Shouyi Yin","doi":"10.1109/TVLSI.2025.3529699","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3529699","url":null,"abstract":"Through chiplet reuse, chiplet-based system designs have emerged as a cost-effective solution for system-on-chips (SoCs), yet considerable silicon interposer costs often negate the benefits. Though general reusable interposers (GRIs) can lower the cost, they often compromise on performance and energy efficiency. In this article, a domain-specific reusable active interposer (active DSRI) approach is proposed for a better cost-efficiency tradeoff. Moreover, a chiplet platform based on an active DSRI designed for the intelligent radar/sonar (IRS) domain is introduced to facilitate rapid and customized SoC development. This platform offers flexible and energy-efficient interconnections tailored for IRS, platform infrastructure functions, and peripherals to simplify the chiplets. Furthermore, it integrates lightweight, composable standard 3-D interfaces across the chiplets and interposer, delivering up to 96-Gb/s bandwidth, 11.1-ns latency, and 0.62-pJ/bit energy efficiency, well controlling the cost and power penalties of SoC partition. Demonstrated with a customized hand gesture recognition sonar system (HGRSS) baseband SoC implemented on the proposed platform, it achieves similar performance to a monolithic SoC, with a recognition frame rate of 6286 frames/s, where overhead of the 3-D interface is only 6.86% in area and 4.84% in power. Our approach proves cost-effective, energy efficient, and customizable, moving system volume breakeven point forward by <inline-formula> <tex-math>$3.22sim 3.36$ </tex-math></inline-formula> times, and reducing the cost by 58.5%~59.8%. This represents a pioneering demonstration of reusable chiplets in HGRSS, showcasing the potential of our approach for broader domains.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 4","pages":"903-915"},"PeriodicalIF":2.8,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FlooNoC: A 645-Gb/s/link 0.15-pJ/B/hop Open-Source NoC With Wide Physical Links and End-to-End AXI4 Parallel Multistream Support FlooNoC:一个645gb /s/link 0.15-pJ/B/hop的开源NoC,具有宽物理链路和端到端并行多流支持
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-01-22 DOI: 10.1109/TVLSI.2025.3527225
Tim Fischer;Michael Rogenmoser;Thomas Benz;Frank K. Gürkaynak;Luca Benini
{"title":"FlooNoC: A 645-Gb/s/link 0.15-pJ/B/hop Open-Source NoC With Wide Physical Links and End-to-End AXI4 Parallel Multistream Support","authors":"Tim Fischer;Michael Rogenmoser;Thomas Benz;Frank K. Gürkaynak;Luca Benini","doi":"10.1109/TVLSI.2025.3527225","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3527225","url":null,"abstract":"The new generation of domain-specific AI accelerators is characterized by rapidly increasing demands for bulk data transfers, as opposed to small, latency-critical cache line transfers typical of traditional cache-coherent systems. In this article, we address this critical need by introducing the FlooNoC network-on-chip (NoC), featuring very wide, fully advanced extensible interface (AXI4) compliant links designed to meet the massive bandwidth needs at high energy efficiency. At the transport level, nonblocking transactions are supported for latency tolerance. In addition, a novel end-to-end ordering approach for AXI4, enabled by a multistream capable direct memory access (DMA) engine, simplifies network interfaces (NIs) and eliminates interstream dependencies. Furthermore, dedicated physical links are instantiated for short, latency-critical messages. A complete end-to-end reference implementation in 12-nm FinFET technology demonstrates the physical feasibility and power performance area (PPA) benefits of our approach. Using wide links on high levels of metal, we achieve a bandwidth of 645 Gb/s/link and a total aggregate bandwidth of 103 Tb/s for an <inline-formula> <tex-math>$8times 4$ </tex-math></inline-formula> mesh of processors’ cluster tiles, with a total of 288 RISC-V cores. The NoC imposes a minimal area overhead of only 3.5% per compute tile and achieves a leading-edge energy efficiency of 0.15 pJ/B/hop at 0.8 V. Compared with state-of-the-art (SoA) NoCs, our system offers three times the energy efficiency and more than double the link bandwidth. Furthermore, compared with a traditional AXI4-based multilayer interconnect, our NoC achieves a 30% reduction in area, corresponding to a 47% increase in GFLOPSDP within the same floorplan.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 4","pages":"1094-1107"},"PeriodicalIF":2.8,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Protecting Analog Circuits Using Switch Mode Time Domain Locking 利用开关模式时域锁定保护模拟电路
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-01-22 DOI: 10.1109/TVLSI.2025.3528320
Utkarsh Kumar;Sudhanshu Khanna;Ankit Mittal;Aatmesh Shrivastava
{"title":"Protecting Analog Circuits Using Switch Mode Time Domain Locking","authors":"Utkarsh Kumar;Sudhanshu Khanna;Ankit Mittal;Aatmesh Shrivastava","doi":"10.1109/TVLSI.2025.3528320","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3528320","url":null,"abstract":"Analog circuits remain vulnerable to different types of supply chain attacks including piracy, overproduction, counterfeiting, and reverse engineering. In this article, we present switch mode time domain locking (SMDL) technique to protect analog circuits. This technique integrates a locking mechanism into the time-domain functionality of the circuit. It uses random-key-based switching phases for analog circuits instead of fixed clocks that are conventionally used. The random switching phases are dependent on a key which can be made arbitrarily long. A correct key (CK) with correct alignment of phases can unlock circuit functionality. The locking technique can be applied to a variety of switch-mode analog circuits such as filters, amplifiers, regulators, among others. We implemented this technique on a folded cascode amplifier (FCA) and on a switched-capacitor bandgap reference (BGR) circuit. In both techniques, we employ a 128-bit key to lock the circuit functionality. The design is implemented in a 65-nm CMOS technology. An incorrect key (IK) introduces almost 100% variation in the circuit functionality, ensuring high level of security.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 4","pages":"916-928"},"PeriodicalIF":2.8,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information 超大规模集成电路(VLSI)系统学报
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-01-22 DOI: 10.1109/TVLSI.2025.3527804
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information","authors":"","doi":"10.1109/TVLSI.2025.3527804","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3527804","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"C3-C3"},"PeriodicalIF":2.8,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10849954","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel High-Speed Adaptive Duobinary Digital Detector Based on the Feed-Forward Equalizer and the Maximum Likelihood Sequence Detector for Wireline Transceivers 基于前馈均衡器和最大似然序列检测器的有线收发器高速自适应双二进制数字检测器
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-01-22 DOI: 10.1109/TVLSI.2025.3528127
Chaolong Xu;Fangxu Lv;Mingche Lai;Xingyun Qi;Qiang Wang;Zhang Luo;Shijie Li;Geng Zhang
{"title":"A Novel High-Speed Adaptive Duobinary Digital Detector Based on the Feed-Forward Equalizer and the Maximum Likelihood Sequence Detector for Wireline Transceivers","authors":"Chaolong Xu;Fangxu Lv;Mingche Lai;Xingyun Qi;Qiang Wang;Zhang Luo;Shijie Li;Geng Zhang","doi":"10.1109/TVLSI.2025.3528127","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3528127","url":null,"abstract":"To solve the high bit error rate (BER) problem of conventional 56-Gb/s nonreturn-to-zero (NRZ) transceivers under high-insertion loss (IL) channels, this study proposes a high-speed adaptive duobinary (DB) digital detector based on the feed-forward equalizer (FFE) and the maximum likelihood sequence detector (MLSD). In this detector, adaptive FFE is combined with channel characteristics to generate DB signals and complete equalization, thus extending the transmission bandwidth and eye height and allowing a larger sampling phase offset. The parallel MLSD is used to complete the detection and decoding of DB signals to reduce the BER. An adaptive algorithm is proposed to avoid the long convergence time of the conventional zero-forcing (ZF) algorithm applied to the DB detector, so that it can be applied to various bit rates and IL channels. In this study, the verification of this DB detector is accomplished at 56 Gb/s. The platform based on a 56-Gb/s analog front-end chip (AFEC) and field-programmable gate array (FPGA) proves that the detector can work well in 12–56 Gb/s and multiple IL channels. The BER was less than 2e-8 at 56 Gb/s on −42-dB channel loss at 28 GHz. The structure can be well used for higher rate transceivers, such as 112 Gb/s.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 4","pages":"1042-1052"},"PeriodicalIF":2.8,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information IEEE超大规模集成电路(VLSI)系统学报
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-01-22 DOI: 10.1109/TVLSI.2024.3523620
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information","authors":"","doi":"10.1109/TVLSI.2024.3523620","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3523620","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"C2-C2"},"PeriodicalIF":2.8,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10849955","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Low-Cost and Triple-Node-Upset Self-Recoverable Latch Design With Low Soft Error Rate 一种低成本、低软错误率的三节点扰动自恢复锁存器设计
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-01-22 DOI: 10.1109/TVLSI.2025.3528199
Licai Hao;Lang Tian;Hao Wang;Shiyu Zhao;Qiang Zhao;Chunyu Peng;Chenghu Dai;Zhitin Lin;Xiulong Wu
{"title":"A Low-Cost and Triple-Node-Upset Self-Recoverable Latch Design With Low Soft Error Rate","authors":"Licai Hao;Lang Tian;Hao Wang;Shiyu Zhao;Qiang Zhao;Chunyu Peng;Chenghu Dai;Zhitin Lin;Xiulong Wu","doi":"10.1109/TVLSI.2025.3528199","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3528199","url":null,"abstract":"With the decrease in feature size of transistors, latches are more sensitive to single-event multiple node upset (MNU), including double node upset (DNU) and triple node upset (TNU). However, the reported TNU self-recoverable (TNUR) latches are facing problems with large areas and power consumption. Based on the polarity design, this article proposes a low-cost TNUR latch (LCTRL) with a low soft error rate (SER) in 28-nm CMOS technology. The proposed LCTRL mainly consists of four interlocked modules and a clock-gated inverter. Compared with the state-of-the-art TNUR latches, including LCTNURL, IHTRL, FATNU, and TRLW, the power consumption, D-Q delay, CLK-to-Q delay, area, and the power-delay–area product (PDAP) of the proposed LCTRL are reduced by 55.09%, 38.64%, 42.93%, 44.65%, and 83.50%, respectively. Due to the polarity design, the SER of the proposed LCTRL is the smallest among compared latches, which suggests that the proposed LCTRL is suitable for use in radiation environments.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 4","pages":"1108-1117"},"PeriodicalIF":2.8,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143675888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 285-nA Quiescent Current, 94.7% Peak Efficiency Buck Converter With AOT Control for IoT Application 基于AOT控制的285-nA静态电流、94.7%峰值效率降压变换器
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-01-17 DOI: 10.1109/TVLSI.2025.3527453
Yuxin Zhang;Jueping Cai;Jizhang Chen;Lifeng Jiang;Yixin Yin
{"title":"A 285-nA Quiescent Current, 94.7% Peak Efficiency Buck Converter With AOT Control for IoT Application","authors":"Yuxin Zhang;Jueping Cai;Jizhang Chen;Lifeng Jiang;Yixin Yin","doi":"10.1109/TVLSI.2025.3527453","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3527453","url":null,"abstract":"An ultralow quiescent current dc-dc buck converter based on adaptive on-time (AOT) control is presented in this article. To minimize the energy wastage of the dc-dc buck converter circuit when the Internet-of-Things (IoT) device is in standby mode, a control loop with nano-ampere quiescent current is proposed in this converter. To reduce the quiescent current consumed by the voltage reference and improve its line sensitivity (LS), the voltage reference in the proposed converter is preregulated and based on the subthreshold CMOS implementation, with a quiescent current of only 20 nA. Meanwhile, for purpose of maintaining high efficiency of the converter under the ultralow load, an adaptive comparator based on the dynamic bias mode selection circuit is proposed, which converts the load conditions into time information and switches the bias current and gain of the comparator under ultralow loads, and the quiescent current of the comparator is only 65 nA. The proposed converter is implemented in a 0.18-<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>m BCD process with an area of 1.35 mm2. Experimental results show that the converter has a minimum quiescent current of 285 nA, maintains more than 80% conversion efficiency over a load range of <inline-formula> <tex-math>$10~mu $ </tex-math></inline-formula>A–300 mA and a peak efficiency of 94.7%, and has an output of 0.9–4.8 V over a supply condition of 2–5.5 V.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 4","pages":"929-941"},"PeriodicalIF":2.8,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143675977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting Chiplet Integration Technology for Fast High-Capacity DRAM Modules 快速高容量DRAM模组的晶片整合技术开发
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-01-16 DOI: 10.1109/TVLSI.2025.3527976
Zihan Xia;Chihun Song;Ram Krishna;Ashita Victor;Srujan Penta;Muhannad S. Bakir;Elyse Rosenbaum;Nam Sung Kim;Mingu Kang
{"title":"Exploiting Chiplet Integration Technology for Fast High-Capacity DRAM Modules","authors":"Zihan Xia;Chihun Song;Ram Krishna;Ashita Victor;Srujan Penta;Muhannad S. Bakir;Elyse Rosenbaum;Nam Sung Kim;Mingu Kang","doi":"10.1109/TVLSI.2025.3527976","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3527976","url":null,"abstract":"As the end of Moore’s law approaches, chiplet integration technology (or chiplet technology) has emerged to revolutionize future semiconductor chip design. Chiplet technology provides unique advantages over 3-D-stacking technology, including a more cost-efficient and thermal-friendly integration of heterogeneous technologies. Although chiplet technologies have already begun to be used by the latest commercial chips, they have not been explored for commodity dynamic random access memory (DRAM) design yet. Harnessing its advantages for DRAM for the first time, this article evaluates the feasibility of chiplet-based DRAM architecture, considering various physical and electrical constraints imposed by a standard chiplet interface [i.e., universal chiplet interconnect express (UCIe)]. We further explore the DIMM architectures that simplify module packaging and assembly, leading to reductions in total die size and overall costs. The comprehensive cross-level analysis (i.e., device, circuit, chip, and system levels) shows that chiplet-based DRAM reduces <monospace>t_RCD</monospace> + <monospace>t_CAS</monospace>, latency-critical DRAM timing parameters, by <inline-formula> <tex-math>$1.32times $ </tex-math></inline-formula>–<inline-formula> <tex-math>$1.39times $ </tex-math></inline-formula>, at the same energy consumption. In addition, a <inline-formula> <tex-math>$1.39times $ </tex-math></inline-formula>–<inline-formula> <tex-math>$2.28times $ </tex-math></inline-formula> improvement in <monospace>t_RRD</monospace> is obtained. The reduced DRAM timing parameters improve the overall system performance by up to 8.8%–24.7% (geomean 3.4%–8.4%) in real-life benchmarks. The chiplet-based heterogeneous integration achieves a <inline-formula> <tex-math>$1.27times $ </tex-math></inline-formula> higher chip-level yield compared with the monolithic chip, along with up to 10% reduction in overall cost compared with traditional DIMMs at emerging process technologies.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 5","pages":"1202-1214"},"PeriodicalIF":2.8,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FAMS: A FrAmework of Memory-Centric Mapping for DNNs on Systolic Array Accelerators 收缩阵列加速器上dnn以记忆为中心的映射框架
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-01-16 DOI: 10.1109/TVLSI.2024.3522326
Hao Sun;Junzhong Shen;Tian Zhang;Zhongyi Tang;Changwu Zhang;Yuhang Li;Yang Shi;Hengzhu Liu
{"title":"FAMS: A FrAmework of Memory-Centric Mapping for DNNs on Systolic Array Accelerators","authors":"Hao Sun;Junzhong Shen;Tian Zhang;Zhongyi Tang;Changwu Zhang;Yuhang Li;Yang Shi;Hengzhu Liu","doi":"10.1109/TVLSI.2024.3522326","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3522326","url":null,"abstract":"In recent years, deep neural networks (DNNs) have experienced rapid development. These DNNs demonstrate significant variations in architecture and scale, creating a substantial demand for domain-specific accelerators that are optimized for both high performance and low energy consumption. Systolic array accelerators, due to their efficient dataflow and parallel processing capabilities, offer significant advantages when performing computations for DNNs. Existing studies frequently overlook various hardware constraints in systolic array accelerators when representing mapping strategies. This oversight includes ignoring the differences in delays between communication and computation operations, as well as overlooking the capacities of multilevel memory hierarchies. Such omissions can lead to inaccuracies in predicting accelerator performance and inefficiencies in system design. We propose the FAMS framework, which introduces a memory-centric notation capable of fully representing the mapping of DNN operations on systolic array accelerators. Memory-centric notation moves away from the idealized assumptions of previous notations and considers various hardware constraints, thereby expanding the effective design and mapping spaces. The FAMS framework also includes a cycle-accurate simulator, which takes the hardware configurations, task descriptions, and mapping strategy represented by memory-centric notation as inputs, providing various metrics such as latency and energy consumption. The experimental results demonstrate that our proposed FAMS framework reduces latency by up to 29.7% and increases throughput by 42.4% compared to the state-of-the-art TENET framework. Additionally, under hardware configurations with a MAC delay of 2 and 3 clock cycles, the FAMS framework enhances performance by 12.0% and 25.4%, respectively.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 4","pages":"976-989"},"PeriodicalIF":2.8,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143675887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信