{"title":"Hybrid Attention Spike Transformer","authors":"Xiongfei Fan, Hong Zhang, Yu Zhang","doi":"10.1049/csy2.70010","DOIUrl":null,"url":null,"abstract":"<p>Spike transformers cannot be pretrained due to objective factors such as lack of datasets and memory constraints, which results in a significant performance gap compared to pretrained artificial neural networks (ANNs), thereby hindering their practical applicability. To address this issue, we propose a hybrid attention spike transformer that utilises self-attention with compound tokens and channel attention-based token processing to better capture the inductive biases of the data. We also add convolution in patch splitting and feedforward networks, which not only provides local information but also leverages the translation invariance and locality of convolutions to help the model converge. Experiments on static datasets and neuromorphic datasets demonstrate that our method achieves state-of-the-art performance in the spiking neural networks (SNNs) field. Notably, we achieve a top-1 accuracy of 80.59% on CIFAR-100 with only 4 time steps. As far as we know, it is the first exploration of the spike transformer with multiattention fusion, achieving outstanding effectiveness.</p>","PeriodicalId":34110,"journal":{"name":"IET Cybersystems and Robotics","volume":"7 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/csy2.70010","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Cybersystems and Robotics","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/csy2.70010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Spike transformers cannot be pretrained due to objective factors such as lack of datasets and memory constraints, which results in a significant performance gap compared to pretrained artificial neural networks (ANNs), thereby hindering their practical applicability. To address this issue, we propose a hybrid attention spike transformer that utilises self-attention with compound tokens and channel attention-based token processing to better capture the inductive biases of the data. We also add convolution in patch splitting and feedforward networks, which not only provides local information but also leverages the translation invariance and locality of convolutions to help the model converge. Experiments on static datasets and neuromorphic datasets demonstrate that our method achieves state-of-the-art performance in the spiking neural networks (SNNs) field. Notably, we achieve a top-1 accuracy of 80.59% on CIFAR-100 with only 4 time steps. As far as we know, it is the first exploration of the spike transformer with multiattention fusion, achieving outstanding effectiveness.