SNN-BERT: Training-efficient Spiking Neural Networks for energy-efficient BERT

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2024-08-20 DOI:10.1016/j.neunet.2024.106630

{"title":"SNN-BERT: Training-efficient Spiking Neural Networks for energy-efficient BERT","authors":"","doi":"10.1016/j.neunet.2024.106630","DOIUrl":null,"url":null,"abstract":"<div>Spiking Neural Networks (SNNs) are naturally suited to process sequence tasks such as NLP with low power, due to its brain-inspired spatio-temporal dynamics and spike-driven nature. Current SNNs employ ”repeat coding” that re-enter all input tokens at each timestep, which fails to fully exploit temporal relationships between the tokens and introduces memory overhead. In this work, we align the number of input tokens with the timestep and refer to this input coding as ”individual coding”. To cope with the increase in training time for individual encoded SNNs due to the dramatic increase in timesteps, we design a Bidirectional Parallel Spiking Neuron (BPSN) with following features: First, BPSN supports spike parallel computing and effectively avoids the issue of uninterrupted firing; Second, BPSN excels in handling adaptive sequence length tasks, which is a capability that existing work does not have; Third, the fusion of bidirectional information enhances the temporal information modeling capabilities of SNNs; To validate the effectiveness of our BPSN, we present the SNN-BERT, a deep direct training SNN architecture based on the BERT model in NLP. Compared to prior repeat 4-timestep coding baseline, our method achieves a 6.46<math><mo>×</mo></math> reduction in energy consumption and a significant 16.1% improvement, raising the performance upper bound of the SNN domain on the GLUE dataset to 74.4%. Additionally, our method achieves 3.5<math><mo>×</mo></math> training acceleration and 3.8<math><mo>×</mo></math> training memory optimization. Compared with artificial neural networks of similar architecture, we obtain comparable performance but up to 22.5<math><mo>×</mo></math> energy efficiency. We would provide the codes.</div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608024005549","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Spiking Neural Networks (SNNs) are naturally suited to process sequence tasks such as NLP with low power, due to its brain-inspired spatio-temporal dynamics and spike-driven nature. Current SNNs employ ”repeat coding” that re-enter all input tokens at each timestep, which fails to fully exploit temporal relationships between the tokens and introduces memory overhead. In this work, we align the number of input tokens with the timestep and refer to this input coding as ”individual coding”. To cope with the increase in training time for individual encoded SNNs due to the dramatic increase in timesteps, we design a Bidirectional Parallel Spiking Neuron (BPSN) with following features: First, BPSN supports spike parallel computing and effectively avoids the issue of uninterrupted firing; Second, BPSN excels in handling adaptive sequence length tasks, which is a capability that existing work does not have; Third, the fusion of bidirectional information enhances the temporal information modeling capabilities of SNNs; To validate the effectiveness of our BPSN, we present the SNN-BERT, a deep direct training SNN architecture based on the BERT model in NLP. Compared to prior repeat 4-timestep coding baseline, our method achieves a 6.46 $\times$ reduction in energy consumption and a significant 16.1% improvement, raising the performance upper bound of the SNN domain on the GLUE dataset to 74.4%. Additionally, our method achieves 3.5 $\times$ training acceleration and 3.8 $\times$ training memory optimization. Compared with artificial neural networks of similar architecture, we obtain comparable performance but up to 22.5 $\times$ energy efficiency. We would provide the codes.

查看原文本刊更多论文

SNN-BERT：用于高能效 BERT 的训练高效尖峰神经网络

尖峰神经网络（SNN）具有受大脑启发的时空动态和尖峰驱动特性，因此非常适合以较低功耗处理 NLP 等序列任务。目前的神经元网络采用 "重复编码 "技术，在每个时间步重新输入所有输入标记，这种方法无法充分利用标记之间的时间关系，而且会带来内存开销。在这项研究中，我们将输入标记的数量与时间步长保持一致，并将这种输入编码称为 "单个编码"。为了应对单个编码 SNNs 因时间步长大幅增加而导致的训练时间增加，我们设计了一种具有以下特点的双向并行尖峰神经元（BPSN）：首先，BPSN 支持尖峰并行计算，有效避免了不间断发射的问题；其次，BPSN 擅长处理自适应序列长度任务，这是现有工作所不具备的能力；第三，双向信息的融合增强了 SNN 的时间信息建模能力；为了验证 BPSN 的有效性，我们提出了 SNN-BERT，一种基于 NLP 中 BERT 模型的深度直接训练 SNN 架构。与之前的重复四步编码基线相比，我们的方法减少了 6.46 倍的能耗，显著提高了 16.1%，将 SNN 领域在 GLUE 数据集上的性能上限提高到了 74.4%。此外，我们的方法还实现了 3.5 倍的训练加速和 3.8 倍的训练内存优化。与类似架构的人工神经网络相比，我们的方法性能相当，但能效高达 22.5 倍。我们将提供相关代码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.