Advancing Neuromorphic Architecture Toward Emerging Spiking Neural Network on FPGA

IF 2.9 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-03-03 DOI:10.1109/TCAD.2025.3547275

Yingxue Gao;Teng Wang;Yang Yang;Lei Gong;Xianglan Chen;Chao Wang;Xi Li;Xuehai Zhou

{"title":"Advancing Neuromorphic Architecture Toward Emerging Spiking Neural Network on FPGA","authors":"Yingxue Gao;Teng Wang;Yang Yang;Lei Gong;Xianglan Chen;Chao Wang;Xi Li;Xuehai Zhou","doi":"10.1109/TCAD.2025.3547275","DOIUrl":null,"url":null,"abstract":"Spiking neural networks (SNNs) replace the multiply-and-accumulate operations in traditional artificial neural networks (ANNs) with lightweight mask-and-accumulate operations, achieving greater performance. Existing SNN architectures are primarily designed based on fully-connected or convolutional SNN topologies and still struggle with low task accuracy, limiting their practical applications. Recently, transformer SNN (TSNN) models have shown promise in matching the accuracy of nonspiking ANNs and demonstrated potential application prospects. However, their diverse computation pattern and sophisticated network structure with high computation and memory footprints impede their efficient deployment. Thus, in this work, we move our attention to heterogeneous architecture design and propose SpikeTA, the first neuromorphic hardware accelerator explicitly designed for the TSNN model on FPGA. First, SpikeTA enables parameterizable hardware engines (HEs) designed for the network layers in TSNN, enhancing compatibility between HEs and network layers. Second, SpikeTA optimizes arithmetic operations between binary spikes and synaptic weights by presenting a DSP-efficient addition tree. By analyzing the inherent data characteristics, SpikeTA further introduces a depth-aware buffer management strategy to provide sufficient access ports. Third, SpikeTA employs a streaming dataflow mapping to optimize data transmission granularity and leverages a split-engine dataflow mapping to facilitate pipelined latency balancing. Experimental results demonstrate that SpikeTA achieves significant performance speedups of <inline-formula> <tex-math>$140.73\\times $ </tex-math></inline-formula>–<inline-formula> <tex-math>$1023.53\\times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$2.97\\times $ </tex-math></inline-formula>–<inline-formula> <tex-math>$7.29\\times $ </tex-math></inline-formula> over architectures running on the AMD EPYC 7542 CPU and NVIDIA A100 GPU, respectively. SpikeTA also outperforms state-of-the-art SNN and Transformer accelerators by <inline-formula> <tex-math>$2.79\\times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$2.66\\times $ </tex-math></inline-formula> in architecture performance while achieving a peak performance of 28.99 TOPs.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 9","pages":"3465-3478"},"PeriodicalIF":2.9000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10908632/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Spiking neural networks (SNNs) replace the multiply-and-accumulate operations in traditional artificial neural networks (ANNs) with lightweight mask-and-accumulate operations, achieving greater performance. Existing SNN architectures are primarily designed based on fully-connected or convolutional SNN topologies and still struggle with low task accuracy, limiting their practical applications. Recently, transformer SNN (TSNN) models have shown promise in matching the accuracy of nonspiking ANNs and demonstrated potential application prospects. However, their diverse computation pattern and sophisticated network structure with high computation and memory footprints impede their efficient deployment. Thus, in this work, we move our attention to heterogeneous architecture design and propose SpikeTA, the first neuromorphic hardware accelerator explicitly designed for the TSNN model on FPGA. First, SpikeTA enables parameterizable hardware engines (HEs) designed for the network layers in TSNN, enhancing compatibility between HEs and network layers. Second, SpikeTA optimizes arithmetic operations between binary spikes and synaptic weights by presenting a DSP-efficient addition tree. By analyzing the inherent data characteristics, SpikeTA further introduces a depth-aware buffer management strategy to provide sufficient access ports. Third, SpikeTA employs a streaming dataflow mapping to optimize data transmission granularity and leverages a split-engine dataflow mapping to facilitate pipelined latency balancing. Experimental results demonstrate that SpikeTA achieves significant performance speedups of

$140.73\times $

–

$1023.53\times $

and

$2.97\times $

–

$7.29\times $

over architectures running on the AMD EPYC 7542 CPU and NVIDIA A100 GPU, respectively. SpikeTA also outperforms state-of-the-art SNN and Transformer accelerators by

$2.79\times $

and

$2.66\times $

in architecture performance while achieving a peak performance of 28.99 TOPs.

查看原文本刊更多论文

面向新兴脉冲神经网络的FPGA神经形态结构的推进

尖峰神经网络（SNNs）用轻量级的掩模累加操作取代了传统人工神经网络（ann）中的乘法累加操作，获得了更高的性能。现有的SNN架构主要是基于全连接或卷积SNN拓扑设计的，并且仍然存在低任务精度的问题，限制了它们的实际应用。近年来，变压器SNN （TSNN）模型在匹配非尖峰人工神经网络的精度方面表现出了良好的前景，并展示了潜在的应用前景。然而，它们的计算模式多样，网络结构复杂，计算和内存占用大，阻碍了它们的高效部署。因此，在这项工作中，我们将注意力转移到异构架构设计上，并提出了SpikeTA，这是第一个明确为FPGA上的TSNN模型设计的神经形态硬件加速器。首先，SpikeTA支持为TSNN中的网络层设计的可参数化硬件引擎（he），增强了he与网络层之间的兼容性。其次，SpikeTA通过提供dsp高效的加法树来优化二进制尖峰和突触权重之间的算术运算。通过分析固有的数据特征，SpikeTA进一步引入了深度感知缓冲区管理策略，以提供足够的访问端口。第三，SpikeTA采用流数据流映射来优化数据传输粒度，并利用分裂引擎数据流映射来促进流水线延迟平衡。实验结果表明，SpikeTA在AMD EPYC 7542 CPU和NVIDIA A100 GPU上分别实现了140.73倍至1023.53倍和2.97倍至7.29倍的显著性能提升。SpikeTA在架构性能上也比最先进的SNN和Transformer加速器分别高出2.79美元和2.66美元，同时达到28.99 TOPs的峰值性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 工程技术-工程：电子与电气

CiteScore

5.60

自引率

13.80%

发文量

500

审稿时长

7 months

期刊介绍： The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.