具有多结构兼容性的高效尖峰卷积神经网络加速器。

IF 3.2 3区医学 Q2 NEUROSCIENCES

Frontiers in Neuroscience Pub Date : 2025-09-26 eCollection Date: 2025-01-01 DOI:10.3389/fnins.2025.1662886

Jiadong Wu, Lun Lu, Yinan Wang, Zhiwei Li, Changlin Chen, Qingjiang Li, Kairang Chen

{"title":"具有多结构兼容性的高效尖峰卷积神经网络加速器。","authors":"Jiadong Wu, Lun Lu, Yinan Wang, Zhiwei Li, Changlin Chen, Qingjiang Li, Kairang Chen","doi":"10.3389/fnins.2025.1662886","DOIUrl":null,"url":null,"abstract":"Spiking Neural Networks (SNNs) possess excellent computational energy efficiency and biological credibility. Among them, Spiking Convolutional Neural Networks (SCNNs) have significantly improved performance, demonstrating promising applications in low-power and brain-like computing. To achieve hardware acceleration for SCNNs, we propose an efficient FPGA accelerator architecture with multi-structure compatibility. This architecture supports both traditional convolutional and residual topologies, and can be adapted to diverse requirements from small networks to complex networks. This architecture uses a clock-driven scheme to perform convolution and neuron updates based on the spike-encoded image at each timestep. Through hierarchical pipelining and channel parallelization strategies, the computation speed of SCNNs is increased. To address the issue of current accelerators only supporting simple network, this architecture combines configuration and scheduling methods, including grouped reuse computation and line-by-line multi-timestep computation to accelerate deep networks with lots of channels and large feature map sizes. Based on the proposed accelerator architecture, we evaluated two scales of networks, named small-scale LeNet and deep residual SCNN, for object detection. Experiments show that the proposed accelerator achieves a maximum recognition speed of 1, 605 frames/s at a 100 MHz clock for the LeNet network, consuming only 0.65 mJ per image. Furthermore, the accelerator, combined with the proposed configuration and scheduling methods, achieves acceleration for each residual module in the deep residual SCNN, reaching a processing speed of 2.59 times that of the CPU with a power consumption of only 16.77% of the CPU. This demonstrates that the proposed accelerator architecture can achieve higher energy efficiency, compatibility, and wider applicability.","PeriodicalId":12639,"journal":{"name":"Frontiers in Neuroscience","volume":"19 ","pages":"1662886"},"PeriodicalIF":3.2000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12511070/pdf/","citationCount":"0","resultStr":"{\"title\":\"Efficient spiking convolutional neural networks accelerator with multi-structure compatibility.\",\"authors\":\"Jiadong Wu, Lun Lu, Yinan Wang, Zhiwei Li, Changlin Chen, Qingjiang Li, Kairang Chen\",\"doi\":\"10.3389/fnins.2025.1662886\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Spiking Neural Networks (SNNs) possess excellent computational energy efficiency and biological credibility. Among them, Spiking Convolutional Neural Networks (SCNNs) have significantly improved performance, demonstrating promising applications in low-power and brain-like computing. To achieve hardware acceleration for SCNNs, we propose an efficient FPGA accelerator architecture with multi-structure compatibility. This architecture supports both traditional convolutional and residual topologies, and can be adapted to diverse requirements from small networks to complex networks. This architecture uses a clock-driven scheme to perform convolution and neuron updates based on the spike-encoded image at each timestep. Through hierarchical pipelining and channel parallelization strategies, the computation speed of SCNNs is increased. To address the issue of current accelerators only supporting simple network, this architecture combines configuration and scheduling methods, including grouped reuse computation and line-by-line multi-timestep computation to accelerate deep networks with lots of channels and large feature map sizes. Based on the proposed accelerator architecture, we evaluated two scales of networks, named small-scale LeNet and deep residual SCNN, for object detection. Experiments show that the proposed accelerator achieves a maximum recognition speed of 1, 605 frames/s at a 100 MHz clock for the LeNet network, consuming only 0.65 mJ per image. Furthermore, the accelerator, combined with the proposed configuration and scheduling methods, achieves acceleration for each residual module in the deep residual SCNN, reaching a processing speed of 2.59 times that of the CPU with a power consumption of only 16.77% of the CPU. This demonstrates that the proposed accelerator architecture can achieve higher energy efficiency, compatibility, and wider applicability.\",\"PeriodicalId\":12639,\"journal\":{\"name\":\"Frontiers in Neuroscience\",\"volume\":\"19 \",\"pages\":\"1662886\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12511070/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Neuroscience\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3389/fnins.2025.1662886\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"NEUROSCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fnins.2025.1662886","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"NEUROSCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

脉冲神经网络（SNNs）具有优异的计算能量效率和生物可信度。其中，尖峰卷积神经网络（SCNNs）的性能有了显著提高，在低功耗和类脑计算方面展示了有前景的应用。为了实现scnn的硬件加速，我们提出了一种高效的FPGA加速架构，具有多结构兼容性。该体系结构既支持传统的卷积拓扑结构，也支持残差拓扑结构，能够适应从小型网络到复杂网络的各种需求。该体系结构采用时钟驱动方案，在每个时间步长基于峰值编码的图像执行卷积和神经元更新。通过分层流水线和通道并行化策略，提高了scnn的计算速度。为了解决当前加速器只支持简单网络的问题，该架构结合了配置和调度方法，包括分组重用计算和逐行多时间步计算，以加速具有大量通道和大特征映射大小的深度网络。基于所提出的加速器架构，我们评估了两种尺度的网络，即小规模LeNet和深度残差SCNN，用于目标检测。实验表明，该加速器在100mhz频率下对LeNet网络的最大识别速度为1605帧/秒，每张图像仅消耗0.65 mJ。此外，该加速器结合所提出的配置和调度方法，实现了深度残差SCNN中每个残差模块的加速，处理速度达到CPU的2.59倍，功耗仅为CPU的16.77%。这表明所提出的加速器架构可以实现更高的能效、兼容性和更广泛的适用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient spiking convolutional neural networks accelerator with multi-structure compatibility.

Spiking Neural Networks (SNNs) possess excellent computational energy efficiency and biological credibility. Among them, Spiking Convolutional Neural Networks (SCNNs) have significantly improved performance, demonstrating promising applications in low-power and brain-like computing. To achieve hardware acceleration for SCNNs, we propose an efficient FPGA accelerator architecture with multi-structure compatibility. This architecture supports both traditional convolutional and residual topologies, and can be adapted to diverse requirements from small networks to complex networks. This architecture uses a clock-driven scheme to perform convolution and neuron updates based on the spike-encoded image at each timestep. Through hierarchical pipelining and channel parallelization strategies, the computation speed of SCNNs is increased. To address the issue of current accelerators only supporting simple network, this architecture combines configuration and scheduling methods, including grouped reuse computation and line-by-line multi-timestep computation to accelerate deep networks with lots of channels and large feature map sizes. Based on the proposed accelerator architecture, we evaluated two scales of networks, named small-scale LeNet and deep residual SCNN, for object detection. Experiments show that the proposed accelerator achieves a maximum recognition speed of 1, 605 frames/s at a 100 MHz clock for the LeNet network, consuming only 0.65 mJ per image. Furthermore, the accelerator, combined with the proposed configuration and scheduling methods, achieves acceleration for each residual module in the deep residual SCNN, reaching a processing speed of 2.59 times that of the CPU with a power consumption of only 16.77% of the CPU. This demonstrates that the proposed accelerator architecture can achieve higher energy efficiency, compatibility, and wider applicability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in Neuroscience NEUROSCIENCES-

CiteScore

6.20

自引率

4.70%

发文量

2070

审稿时长

14 weeks

期刊介绍： Neural Technology is devoted to the convergence between neurobiology and quantum-, nano- and micro-sciences. In our vision, this interdisciplinary approach should go beyond the technological development of sophisticated methods and should contribute in generating a genuine change in our discipline.