SpQuant-SNN: ultra-low precision membrane potential with sparse activations unlock the potential of on-device spiking neural networks applications

IF 4.3 3区 材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Ahmed Hasssan, Jian Meng, Anupreetham Anupreetham, Jae-sun Seo
{"title":"SpQuant-SNN: ultra-low precision membrane potential with sparse activations unlock the potential of on-device spiking neural networks applications","authors":"Ahmed Hasssan, Jian Meng, Anupreetham Anupreetham, Jae-sun Seo","doi":"10.3389/fnins.2024.1440000","DOIUrl":null,"url":null,"abstract":"Spiking neural networks (SNNs) have received increasing attention due to their high biological plausibility and energy efficiency. The binary spike-based information propagation enables efficient sparse computation in event-based and static computer vision applications. However, the weight precision and especially the membrane potential precision remain as high-precision values (e.g., 32 bits) in state-of-the-art SNN algorithms. Each neuron in an SNN stores the membrane potential over time and typically updates its value in every time step. Such frequent read/write operations of high-precision membrane potential incur storage and memory access overhead in SNNs, which undermines the SNNs' compatibility with resource-constrained hardware. To resolve this inefficiency, prior works have explored the time step reduction and low-precision representation of membrane potential at a limited scale and reported significant accuracy drops. Furthermore, while recent advances in on-device AI present pruning and quantization optimization with different architectures and datasets, simultaneous pruning with quantization is highly under-explored in SNNs. In this work, we present <jats:italic>SpQuant-SNN</jats:italic>, a fully-quantized spiking neural network with <jats:italic>ultra-low precision weights, membrane potential, and high spatial-channel sparsity</jats:italic>, enabling the end-to-end low precision with significantly reduced operations on SNN. First, we propose an integer-only quantization scheme for the membrane potential with a stacked surrogate gradient function, a simple-yet-effective method that enables the smooth learning process of quantized SNN training. Second, we implement spatial-channel pruning with membrane potential prior, toward reducing the layer-wise computational complexity, and floating-point operations (FLOPs) in SNNs. Finally, to further improve the accuracy of low-precision and sparse SNN, we propose a self-adaptive learnable potential threshold for SNN training. Equipped with high biological adaptiveness, minimal computations, and memory utilization, SpQuant-SNN achieves state-of-the-art performance across multiple SNN models for both event-based and static image datasets, including both image classification and object detection tasks. The proposed SpQuant-SNN achieved up to 13× memory reduction and &amp;gt;4.7× FLOPs reduction with &amp;lt; 1.8% accuracy degradation for both classification and object detection tasks, compared to the SOTA baseline.","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fnins.2024.1440000","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Spiking neural networks (SNNs) have received increasing attention due to their high biological plausibility and energy efficiency. The binary spike-based information propagation enables efficient sparse computation in event-based and static computer vision applications. However, the weight precision and especially the membrane potential precision remain as high-precision values (e.g., 32 bits) in state-of-the-art SNN algorithms. Each neuron in an SNN stores the membrane potential over time and typically updates its value in every time step. Such frequent read/write operations of high-precision membrane potential incur storage and memory access overhead in SNNs, which undermines the SNNs' compatibility with resource-constrained hardware. To resolve this inefficiency, prior works have explored the time step reduction and low-precision representation of membrane potential at a limited scale and reported significant accuracy drops. Furthermore, while recent advances in on-device AI present pruning and quantization optimization with different architectures and datasets, simultaneous pruning with quantization is highly under-explored in SNNs. In this work, we present SpQuant-SNN, a fully-quantized spiking neural network with ultra-low precision weights, membrane potential, and high spatial-channel sparsity, enabling the end-to-end low precision with significantly reduced operations on SNN. First, we propose an integer-only quantization scheme for the membrane potential with a stacked surrogate gradient function, a simple-yet-effective method that enables the smooth learning process of quantized SNN training. Second, we implement spatial-channel pruning with membrane potential prior, toward reducing the layer-wise computational complexity, and floating-point operations (FLOPs) in SNNs. Finally, to further improve the accuracy of low-precision and sparse SNN, we propose a self-adaptive learnable potential threshold for SNN training. Equipped with high biological adaptiveness, minimal computations, and memory utilization, SpQuant-SNN achieves state-of-the-art performance across multiple SNN models for both event-based and static image datasets, including both image classification and object detection tasks. The proposed SpQuant-SNN achieved up to 13× memory reduction and &gt;4.7× FLOPs reduction with &lt; 1.8% accuracy degradation for both classification and object detection tasks, compared to the SOTA baseline.
SpQuant-SNN:具有稀疏激活的超低精度膜电位释放了设备上尖峰神经网络应用的潜力
尖峰神经网络(SNN)因其高度的生物合理性和能效而受到越来越多的关注。基于二进制尖峰的信息传播能在基于事件和静态的计算机视觉应用中实现高效的稀疏计算。然而,在最先进的 SNN 算法中,权重精度,尤其是膜电位精度仍然是高精度值(如 32 位)。SNN 中的每个神经元都会随时间存储膜电位,并通常在每个时间步中更新其值。这种频繁的高精度膜电位读/写操作会在 SNN 中产生存储和内存访问开销,从而影响 SNN 与资源受限硬件的兼容性。为了解决这一低效问题,之前的研究已经探索了在有限范围内减少时间步长和膜电位的低精度表示方法,并报告了显著的精度下降。此外,虽然最近在设备上人工智能领域取得的进展提出了针对不同架构和数据集的剪枝和量化优化,但在 SNNs 中同时进行剪枝和量化的探索还非常不足。在这项工作中,我们提出了 SpQuant-SNN,一种具有超低精度权重、膜电位和高空间通道稀疏性的完全量化尖峰神经网络,从而实现了端到端的低精度,并显著减少了对 SNN 的操作。首先,我们针对膜电位提出了一种堆叠代梯度函数的纯整数量化方案,这是一种简单而有效的方法,可实现量化 SNN 训练的平滑学习过程。其次,我们利用膜电位先验实现了空间通道剪枝,从而降低了 SNN 的层级计算复杂度和浮点运算 (FLOP)。最后,为了进一步提高低精度和稀疏 SNN 的准确性,我们提出了一种用于 SNN 训练的自适应可学习电位阈值。SpQuant-SNN 具有较高的生物适应性,计算量和内存利用率极低,在基于事件和静态图像数据集的多个 SNN 模型(包括图像分类和物体检测任务)中都取得了最先进的性能。与 SOTA 基线相比,SpQuant-SNN 在分类和物体检测任务中实现了高达 13 倍的内存缩减和 4.7 倍的 FLOPs 缩减,而精度降低了 1.8%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.20
自引率
4.30%
发文量
567
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信