SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models

Shuaijie Shen, Chao Wang, Renzhuo Huang, Yan Zhong, Qinghai Guo, Zhichao Lu, Jianguo Zhang, Luziwei Leng
{"title":"SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models","authors":"Shuaijie Shen, Chao Wang, Renzhuo Huang, Yan Zhong, Qinghai Guo, Zhichao Lu, Jianguo Zhang, Luziwei Leng","doi":"arxiv-2408.14909","DOIUrl":null,"url":null,"abstract":"Known as low energy consumption networks, spiking neural networks (SNNs) have\ngained a lot of attention within the past decades. While SNNs are increasing\ncompetitive with artificial neural networks (ANNs) for vision tasks, they are\nrarely used for long sequence tasks, despite their intrinsic temporal dynamics.\nIn this work, we develop spiking state space models (SpikingSSMs) for long\nsequence learning by leveraging on the sequence learning abilities of state\nspace models (SSMs). Inspired by dendritic neuron structure, we hierarchically\nintegrate neuronal dynamics with the original SSM block, meanwhile realizing\nsparse synaptic computation. Furthermore, to solve the conflict of event-driven\nneuronal dynamics with parallel computing, we propose a light-weight surrogate\ndynamic network which accurately predicts the after-reset membrane potential\nand compatible to learnable thresholds, enabling orders of acceleration in\ntraining speed compared with conventional iterative methods. On the long range\narena benchmark task, SpikingSSM achieves competitive performance to\nstate-of-the-art SSMs meanwhile realizing on average 90\\% of network sparsity.\nOn language modeling, our network significantly surpasses existing spiking\nlarge language models (spikingLLMs) on the WikiText-103 dataset with only a\nthird of the model size, demonstrating its potential as backbone architecture\nfor low computation cost LLMs.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"60 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Neural and Evolutionary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.14909","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Known as low energy consumption networks, spiking neural networks (SNNs) have gained a lot of attention within the past decades. While SNNs are increasing competitive with artificial neural networks (ANNs) for vision tasks, they are rarely used for long sequence tasks, despite their intrinsic temporal dynamics. In this work, we develop spiking state space models (SpikingSSMs) for long sequence learning by leveraging on the sequence learning abilities of state space models (SSMs). Inspired by dendritic neuron structure, we hierarchically integrate neuronal dynamics with the original SSM block, meanwhile realizing sparse synaptic computation. Furthermore, to solve the conflict of event-driven neuronal dynamics with parallel computing, we propose a light-weight surrogate dynamic network which accurately predicts the after-reset membrane potential and compatible to learnable thresholds, enabling orders of acceleration in training speed compared with conventional iterative methods. On the long range arena benchmark task, SpikingSSM achieves competitive performance to state-of-the-art SSMs meanwhile realizing on average 90\% of network sparsity. On language modeling, our network significantly surpasses existing spiking large language models (spikingLLMs) on the WikiText-103 dataset with only a third of the model size, demonstrating its potential as backbone architecture for low computation cost LLMs.
SpikingSSMs:利用稀疏并行尖峰状态空间模型学习长序列
尖峰神经网络(SNN)被称为低能耗网络,在过去的几十年里受到了广泛关注。在这项工作中,我们利用状态空间模型(SSM)的序列学习能力,开发了用于长序列学习的尖峰状态空间模型(SpikingSSM)。受树突状神经元结构的启发,我们将神经元动力学与原始的 SSM 模块进行了分层整合,同时实现了解析突触计算。此外,为了解决事件驱动神经元动力学与并行计算的矛盾,我们提出了一种轻量级的代理动力学网络,它能准确预测复位后的膜电位,并兼容可学习的阈值,与传统的迭代方法相比,训练速度加快了几个数量级。在长距离区域基准任务上,SpikingSSM的性能与最先进的SSM相比具有竞争力,同时平均实现了90%的网络稀疏性。在语言建模方面,我们的网络在WikiText-103数据集上显著超越了现有的spiking大型语言模型(spikingLLMs),而模型大小仅为现有模型的三分之一,这证明了它作为低计算成本LLMs骨干架构的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信