IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献

筛选
英文 中文
Editorial: Renewed Excellence for 2025–2026 社论:2025-2026年再创辉煌
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-02-25 DOI: 10.1109/TVLSI.2024.3520396
Mircea R. Stan
{"title":"Editorial: Renewed Excellence for 2025–2026","authors":"Mircea R. Stan","doi":"10.1109/TVLSI.2024.3520396","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3520396","url":null,"abstract":"I am happy and honored to have been reappointed as Editor in Chief (EiC) for the IEEE Transactions on VLSI Systems (TVLSI) for another two-year term. As I continue my efforts to improve the quality of the journal, I am grateful for the renewed trust placed in me by the three IEEE sponsoring societies (CASS, SSCS and CS) and by the VLSI community at large. Contrary to a feared slowdown due to increased difficulties with scaling, the field of Very Large Scale Integration (VLSI) has actually grown at an increasingly fast rate as it provides the hardware backbone for the insatiable AI applications which are taking over the world. The H100/200 GPUs, which are essential for AI training, are the largest “conventional” integrated circuits (IC) with 80 billion transistors, while the wafer-scale WSE2/3, which can provide significant improvements in AI inference, are absolute behemoths with 4 trillion transistors! Mr. Moore can be proud there in heaven for what our industry is able to deliver!","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 3","pages":"603-626"},"PeriodicalIF":2.8,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10903548","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information 超大规模集成电路(VLSI)系统学报
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-02-25 DOI: 10.1109/TVLSI.2025.3539516
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information","authors":"","doi":"10.1109/TVLSI.2025.3539516","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3539516","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 3","pages":"C3-C3"},"PeriodicalIF":2.8,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10903516","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143496460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pulse-Based Prebond TSV Testing 基于脉冲的粘接前TSV测试
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-02-19 DOI: 10.1109/TVLSI.2025.3534862
Xianrui Dou;Huaguo Liang;Zhengfeng Huang;Yingchun Lu;Tian Chen;Maoxiang Yi
{"title":"Pulse-Based Prebond TSV Testing","authors":"Xianrui Dou;Huaguo Liang;Zhengfeng Huang;Yingchun Lu;Tian Chen;Maoxiang Yi","doi":"10.1109/TVLSI.2025.3534862","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3534862","url":null,"abstract":"Due to the immaturity of the manufacturing process, numerous faults often occur in through-silicon vias (TSVs). Prebond TSV testing is crucial in enhancing the performance and yield of chiplet-based integrated chips. However, most existing test methods suffer from the test resolution and hard-to-detect weak faults. A novel prebond TSV test method based on the pulse is proposed to improve the test circuit. By introducing pMOS as a driver in pulse detection, TSV leakage faults can be directly tested, thus improving the resolution of leakage faults’ detection. In addition, the range of test pulsewidth to digital code conversion is effectively improved by the ring oscillator (RO) for coarse detection and pulse shrinking for fine detection, avoiding the problem of large overheads that would be brought about by solely increasing the pulse shrinking chain. The results validated by HSPICE simulation show that it can detect open faults, resistive open faults with <inline-formula> <tex-math>$R_{text {open}} gt $ </tex-math></inline-formula> <inline-formula> <tex-math>$0.9~{mathrm {K}} {mathrm {Omega }}$ </tex-math></inline-formula>, leakage faults with <inline-formula> <tex-math>$R_{text {leak}} lt $ </tex-math></inline-formula> <inline-formula> <tex-math>$30~{mathrm {G}} {mathrm {Omega }}$ </tex-math></inline-formula>, and compound faults consisting of resistive open faults and leakage faults.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 5","pages":"1215-1223"},"PeriodicalIF":2.8,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ISARA: An Island-Style Systolic Array Reconfigurable Accelerator Based on Memristors for Deep Neural Networks ISARA:一种基于忆阻器的岛式收缩阵列可重构加速器
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-02-18 DOI: 10.1109/TVLSI.2024.3521394
Fan Yang;Nan Li;Letian Wang;Pinfeng Jiang;Xiangshui Miao;Xingsheng Wang
{"title":"ISARA: An Island-Style Systolic Array Reconfigurable Accelerator Based on Memristors for Deep Neural Networks","authors":"Fan Yang;Nan Li;Letian Wang;Pinfeng Jiang;Xiangshui Miao;Xingsheng Wang","doi":"10.1109/TVLSI.2024.3521394","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3521394","url":null,"abstract":"The demand for edge artificial intelligence (AI) is significant, particularly in revolutionary technological areas such as the Internet of Things, autonomous driving, and industrial control. However, reliable and high-performance edge AI is still constrained by computing hardware, and improving the performance and reliability of edge AI accelerators remains a key focus for researchers. This work proposes a memristor/resistive random access memory (RRAM)-based island-style systolic array reconfigurable accelerator (ISARA) that meets the reliability and performance requirements of edge AI. Inspired by the island-style architecture of FPGAs, this work proposes a flexible-tile architecture based on RRAM processing element (PE) islands, optimizing the data flow within the systolic array. The design of network-on-chip reduces data processing latency. In addition, to enhance computational efficiency, this work incorporates a bit-fusion scheme within the flexible tile, which reduces analog-to-digital converter (ADC) power consumption and addresses the conductance variation of RRAM. To date, only a few works have completed the entire process from simulation, design, and fabrication to hardware testing. This work fully realizes the design and validation of a new accelerator based on RRAM chips, demonstrating the reliability of RRAM-based systolic array accelerators for the first time. After deploying algorithms, the hardware accelerator achieved recognition rates comparable to software. Compared to similar works, ISARA’s computational efficiency exceeds theirs and has flexible reconfigurability. The same deep neural network (DNN) models are adopted for evaluation and compared to other accelerators, and ISARA’s processing latency is reduced by 200 times.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 4","pages":"963-975"},"PeriodicalIF":2.8,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SRAM BL Predriven Write Operation With Row and Voltage Auto-Tracking Replica BL in Resistance-Dominated Technology Nodes 电阻主导技术节点中具有行电压自动跟踪副本BL的SRAM BL预驱动写操作
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-02-18 DOI: 10.1109/TVLSI.2025.3540199
Keonhee Cho;Minjune Yeo;Seungjae Yei;Giseok Kim;Sangyeop Baeck;Seong-Ook Jung
{"title":"SRAM BL Predriven Write Operation With Row and Voltage Auto-Tracking Replica BL in Resistance-Dominated Technology Nodes","authors":"Keonhee Cho;Minjune Yeo;Seungjae Yei;Giseok Kim;Sangyeop Baeck;Seong-Ook Jung","doi":"10.1109/TVLSI.2025.3540199","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3540199","url":null,"abstract":"In this article, we analyze the effect of the bitline (BL) predriven write operation in alleviating static random access memory (SRAM) writability degradation caused by BL resistance (<inline-formula> <tex-math>$R_{text {BL}}$ </tex-math></inline-formula>). In BL predriven write operation, BL is fully driven to the ground voltage regardless of <inline-formula> <tex-math>$R_{text {BL}}$ </tex-math></inline-formula> and the cell is written by a strong instantaneous peak write current (<inline-formula> <tex-math>$I_{text {write,peak}}$ </tex-math></inline-formula>) between the cell and BL. The writability yield of BL predriven write operation in the resistance-dominated technology nodes can, thus, be significantly improved. In addition, the row and voltage auto-tracking replica BL (RVAT-RepBL) is proposed to generate BL predriven time (<inline-formula> <tex-math>$T_{text {pre}}$ </tex-math></inline-formula>) for BL predriven write operation. In the proposed RVAT-RepBL, <inline-formula> <tex-math>$T_{text {pre}}$ </tex-math></inline-formula> is generated by automatically tracking the variation in the number of rows per BL, <inline-formula> <tex-math>$R_{text {BL}}$ </tex-math></inline-formula>, and the supply voltage (<inline-formula> <tex-math>$V_{text {DD}}$ </tex-math></inline-formula>). In order to verify the effect of BL predriven write operation, the test chip was fabricated on 28-nm CMOS technology, and the poly resistor arrays were inserted to the cell array to reflect the interconnect resistance in the advanced technology nodes. BL predriven write operation has a higher writability yield and a wider operating <inline-formula> <tex-math>$V_{text {DD}}$ </tex-math></inline-formula> than the conventional write operation. In addition, when the word line (WL) repeater is applied, the results of BL predriven write operation show that the writability yield of BL predriven write operation is further improved as <inline-formula> <tex-math>$I_{text {write,peak}}$ </tex-math></inline-formula> increases with the improvement of WL rising slope.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 5","pages":"1314-1322"},"PeriodicalIF":2.8,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 25-GS/s 8-bit Current-Steering DAC With ADC-Based Duty-Cycle Detection in 40-nm CMOS 基于adc的40纳米CMOS占空比检测的25-GS/s 8位电流转向DAC
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-02-17 DOI: 10.1109/TVLSI.2025.3525706
Xing Li;Lei Zhou;Xuan Guo;Hanbo Jia;Danyu Wu;Jin Wu;Xinyu Liu
{"title":"A 25-GS/s 8-bit Current-Steering DAC With ADC-Based Duty-Cycle Detection in 40-nm CMOS","authors":"Xing Li;Lei Zhou;Xuan Guo;Hanbo Jia;Danyu Wu;Jin Wu;Xinyu Liu","doi":"10.1109/TVLSI.2025.3525706","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3525706","url":null,"abstract":"This brief presents a current-steering 25-GS/s 8-bit digital-to-analog converter (DAC) based on a 40-nm CMOS technology. The DAC employs a dual-edge sampling (DES) architecture to reduce the requirement of main clock frequency, optimizing switching noise and improving power efficiency. DES is sensitive to the clock duty cycle. To minimize the image tones and performance degradation caused by duty-cycle errors, a single-slope analog-to-digital converter (ADC)-based duty-cycle detection and correction scheme is proposed, achieving closed-loop background calibration. A T-coil output network is used to extend the bandwidth. The proposed DAC achieves a spurious-free dynamic range (SFDR) of >40 dBc up to the Nyquist frequency and a >12-GHz output bandwidth with sinc roll-off compensation. The active power consumption is about 272 mW under 1.8-/0.9-/−1.8-V power supply.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 5","pages":"1487-1491"},"PeriodicalIF":2.8,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FANNS: An FPGA-Based Approximate Nearest-Neighbor Search Accelerator FANNS:基于fpga的近似近邻搜索加速器
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-02-13 DOI: 10.1109/TVLSI.2024.3496589
Wei Yuan;Xi Jin
{"title":"FANNS: An FPGA-Based Approximate Nearest-Neighbor Search Accelerator","authors":"Wei Yuan;Xi Jin","doi":"10.1109/TVLSI.2024.3496589","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3496589","url":null,"abstract":"Approximate nearest-neighbor search (ANNS) based on high-dimensional vectors has been extensively utilized in data science and neural networks. However, deploying ANNS in production systems requires minimal redundant computation, high recall rates, and low on-chip memory usage, which existing hardware accelerators fail to offer. We propose FANNS, a solution for ANNS based on high-dimensional vectors that can eliminate redundant computations and reuse on-chip data. Extensive evaluations show that FANNS achieves an average of <inline-formula> <tex-math>$184.1times $ </tex-math></inline-formula>, <inline-formula> <tex-math>$33.0times $ </tex-math></inline-formula>, <inline-formula> <tex-math>$2.9times $ </tex-math></inline-formula>, and <inline-formula> <tex-math>$2.5times $ </tex-math></inline-formula> better energy efficiency than CPUs, GPUs, and two state-of-the-art ANNS architectures, i.e., DF-GAS and Vstore, respectively.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 4","pages":"1197-1201"},"PeriodicalIF":2.8,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143667348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A High-Performance and High-Robustness Triple-Node-Upset Tolerant Latch Based on Redundant-Node Hardening 基于冗余节点硬化的高性能、高鲁棒性三节点抗扰锁存器
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-02-11 DOI: 10.1109/TVLSI.2025.3535926
Qiang Zhao;Qingyi Liu;Xinyi Zhang;Licai Hao;Xin Li;Shengyue Zhang;Chunyu Peng;Zhiting Lin;Xiulong Wu
{"title":"A High-Performance and High-Robustness Triple-Node-Upset Tolerant Latch Based on Redundant-Node Hardening","authors":"Qiang Zhao;Qingyi Liu;Xinyi Zhang;Licai Hao;Xin Li;Shengyue Zhang;Chunyu Peng;Zhiting Lin;Xiulong Wu","doi":"10.1109/TVLSI.2025.3535926","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3535926","url":null,"abstract":"In response to the issues of high cost, large overhead, and limited node fault tolerance in current latch hardening techniques, this article proposes a latch circuit resistant to triple-node-upset (TNU) based on redundant-node hardening technology. This latch comprises eight 1P2N modules interlocked, with its output isolated by two levels of C-elements (CEs), achieving tolerance to TNU. The performance of the redundant-node reinforcement TNU tolerant latch (RNRTTL) was simulated and verified using CMOS 65 nm technology. The simulation results indicate that the RNRTTL circuit has a D-Q delay of 14.14 ps, static power consumption of <inline-formula> <tex-math>$4.03~mu $ </tex-math></inline-formula>w, an area of <inline-formula> <tex-math>$32.87~mu $ </tex-math></inline-formula>m2, and an area-static power-D–Q delay-product (APDP) of 1873, respectively. Compared to the triple-node upset tolerant latches TTLL, TNU-latch, TNURL, and HLTNURL reported in the current literature, the proposed latch demonstrates an average reduction of 219.9%, 164.9%, 150.7%, and 2464.8% in D-Q delay, static power consumption, area, and APDP, respectively, indicating that the RNRTTL latch has superior comprehensive performance; furthermore, a series of 2000 Monte Carlo (MC) simulations on the node group <inline-formula> <tex-math>$langle $ </tex-math></inline-formula>Q, X0, X<inline-formula> <tex-math>$8rangle $ </tex-math></inline-formula> reveal that the proposed latch circuit possesses good stability, making it suitable for harsh radiation environments.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 5","pages":"1373-1383"},"PeriodicalIF":2.8,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Laddered-Inverter Nonoverlapping Clock Generator 阶梯逆变器无重叠时钟发生器
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-02-10 DOI: 10.1109/TVLSI.2025.3537456
Melvin D. Edwards;Mohammad Alhawari
{"title":"A Laddered-Inverter Nonoverlapping Clock Generator","authors":"Melvin D. Edwards;Mohammad Alhawari","doi":"10.1109/TVLSI.2025.3537456","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3537456","url":null,"abstract":"This article presents a new and novel nonoverlapping clock (NOC) generator based on a laddered inverter (LI) circuit. Unlike conventional approaches, the proposed NOC combines the clock generation and pulsewidth-modulation (PWM) circuit into one integrated architecture, offering lower power consumption, smaller area, and a more robust solution. Furthermore, the proposed NOC offers an inherent guarantee of the nonoverlap (dead time) between the output signals thanks to the guaranteed monotonicity of the LI circuit, thus offering a layout-agnostic design. The proposed NOC can also offer dead-time reconfigurability with the help of multiplexers, allowing both calibration and fine-tuning of the dead times to meet specific requirements. We provide a comprehensive assessment of the proposed NOC through simulation and measurement results in 65-nm CMOS. Measured results show that the proposed NOC consumes <inline-formula> <tex-math>$1~mu $ </tex-math></inline-formula>W at 5 MHz with a 1-V supply, achieving more than <inline-formula> <tex-math>$10times $ </tex-math></inline-formula> lower power consumption and 25% smaller area compared to the conventional NOC circuit. The proposed NOC demonstrates the capability to generate waveforms with frequencies up to 3.5 GHz at a 1.2-V supply, as validated through simulations.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 5","pages":"1304-1313"},"PeriodicalIF":2.8,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Strassen Multisystolic Array Hardware Architectures Strassen多收缩阵列硬件架构
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-02-10 DOI: 10.1109/TVLSI.2025.3530785
Trevor E. Pogue;Nicola Nicolici
{"title":"Strassen Multisystolic Array Hardware Architectures","authors":"Trevor E. Pogue;Nicola Nicolici","doi":"10.1109/TVLSI.2025.3530785","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3530785","url":null,"abstract":"While Strassen’s matrix multiplication algorithm reduces the complexity of naive matrix multiplication, general-purpose hardware is not suitable for achieving the algorithm’s promised theoretical speedups. This leaves the question of whether it could be better exploited in custom hardware architectures designed specifically for executing the algorithm. However, there is limited prior work on this and it is not immediately clear how to derive such architectures or whether they can ultimately lead to real improvements. We bridge this gap, presenting and evaluating new systolic array architectures that efficiently translate the theoretical complexity reductions of Strassen’s algorithm directly into hardware resource savings. Furthermore, the architectures are multisystolic array designs that can multiply smaller matrices with higher utilization than single-systolic array designs. The proposed designs implemented on FPGA reduce DSP requirements by a factor of <inline-formula> <tex-math>$1.14^{r}$ </tex-math></inline-formula> for r implemented Strassen recursion levels, and otherwise require overall similar soft logic resources when instantiated to support matrix sizes down to <inline-formula> <tex-math>$32times 32$ </tex-math></inline-formula> and <inline-formula> <tex-math>$24times 24$ </tex-math></inline-formula> at one to two levels of Strassen recursion, respectively. We evaluate the proposed designs in both isolation and an end-to-end machine learning accelerator compared with baseline designs and prior works, achieving state-of-the-art performance.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 5","pages":"1323-1333"},"PeriodicalIF":2.8,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信