{"title":"Spiking Trans-YOLO: A range-adaptive energy-efficient bridge between YOLO and Transformer","authors":"Yushi Huo, Hongwei Ge, Guozhi Tang, Shengxuan Gao, Jiale Xu","doi":"10.1016/j.neucom.2025.130407","DOIUrl":null,"url":null,"abstract":"<div><div>The remarkable success of Transformers in Artificial Neural Networks (ANNs) has driven growing interest in leveraging self-attention mechanisms and Transformer-based architectures for Spiking Neural Network (SNN) object detection. However, existing methods combining Transformers with YOLO lack reasonable bridging strategies, leading to significantly increased computational costs and limitations in local feature extraction. To address these challenges, we propose Spiking Trans-YOLO, introducing a Top-Attention Hybrid Feature Fusion module, which exclusively applies self-attention to high-level spiking features that are more stable and semantically meaningful. This prevents redundant computations caused by unstable low-level spikes and reduces energy consumption. Subsequently, we perform cross-scale feature fusion to compensate for Transformers’ shortcomings in local feature extraction. This approach efficiently bridges YOLO and Transformer architectures while preserving the low-power characteristics of SNNs. Additionally, The newly proposed Integer Leaky Integrate-and-Fire (I-LIF) neuron has demonstrated significant potential in SNNs by enabling integer-valued training and spike-driven inference, thereby reducing quantization errors. However, existing spiking self-attention mechanisms fail to incorporate proper scaling factors for I-LIF neurons, which may lead to gradient vanishing. To address this, we propose a Range-Adaptive Spiking Attention for intra-scale interactions. By dynamically adjusting scaling coefficients, RASA mitigates gradient vanishing issues associated with integer training, allowing I-LIF neurons to exploit the benefits of spike-based self-attention fully. The proposed method achieves 67.7% mAP@50 on the COCO dataset and 68.6% mAP@50 on the Gen1 dataset, outperforming state-of-the-art YOLO architectures and achieving superior energy efficiency compared to advanced Transformer-based architectures. Code: <span><span>https://github.com/s1110/Spiking-Trans-YOLO</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"645 ","pages":"Article 130407"},"PeriodicalIF":5.5000,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225010793","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The remarkable success of Transformers in Artificial Neural Networks (ANNs) has driven growing interest in leveraging self-attention mechanisms and Transformer-based architectures for Spiking Neural Network (SNN) object detection. However, existing methods combining Transformers with YOLO lack reasonable bridging strategies, leading to significantly increased computational costs and limitations in local feature extraction. To address these challenges, we propose Spiking Trans-YOLO, introducing a Top-Attention Hybrid Feature Fusion module, which exclusively applies self-attention to high-level spiking features that are more stable and semantically meaningful. This prevents redundant computations caused by unstable low-level spikes and reduces energy consumption. Subsequently, we perform cross-scale feature fusion to compensate for Transformers’ shortcomings in local feature extraction. This approach efficiently bridges YOLO and Transformer architectures while preserving the low-power characteristics of SNNs. Additionally, The newly proposed Integer Leaky Integrate-and-Fire (I-LIF) neuron has demonstrated significant potential in SNNs by enabling integer-valued training and spike-driven inference, thereby reducing quantization errors. However, existing spiking self-attention mechanisms fail to incorporate proper scaling factors for I-LIF neurons, which may lead to gradient vanishing. To address this, we propose a Range-Adaptive Spiking Attention for intra-scale interactions. By dynamically adjusting scaling coefficients, RASA mitigates gradient vanishing issues associated with integer training, allowing I-LIF neurons to exploit the benefits of spike-based self-attention fully. The proposed method achieves 67.7% mAP@50 on the COCO dataset and 68.6% mAP@50 on the Gen1 dataset, outperforming state-of-the-art YOLO architectures and achieving superior energy efficiency compared to advanced Transformer-based architectures. Code: https://github.com/s1110/Spiking-Trans-YOLO.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.