Spiking Trans-YOLO: A range-adaptive energy-efficient bridge between YOLO and Transformer

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-05-16 DOI:10.1016/j.neucom.2025.130407

Yushi Huo, Hongwei Ge, Guozhi Tang, Shengxuan Gao, Jiale Xu

{"title":"Spiking Trans-YOLO: A range-adaptive energy-efficient bridge between YOLO and Transformer","authors":"Yushi Huo, Hongwei Ge, Guozhi Tang, Shengxuan Gao, Jiale Xu","doi":"10.1016/j.neucom.2025.130407","DOIUrl":null,"url":null,"abstract":"<div><div>The remarkable success of Transformers in Artificial Neural Networks (ANNs) has driven growing interest in leveraging self-attention mechanisms and Transformer-based architectures for Spiking Neural Network (SNN) object detection. However, existing methods combining Transformers with YOLO lack reasonable bridging strategies, leading to significantly increased computational costs and limitations in local feature extraction. To address these challenges, we propose Spiking Trans-YOLO, introducing a Top-Attention Hybrid Feature Fusion module, which exclusively applies self-attention to high-level spiking features that are more stable and semantically meaningful. This prevents redundant computations caused by unstable low-level spikes and reduces energy consumption. Subsequently, we perform cross-scale feature fusion to compensate for Transformers’ shortcomings in local feature extraction. This approach efficiently bridges YOLO and Transformer architectures while preserving the low-power characteristics of SNNs. Additionally, The newly proposed Integer Leaky Integrate-and-Fire (I-LIF) neuron has demonstrated significant potential in SNNs by enabling integer-valued training and spike-driven inference, thereby reducing quantization errors. However, existing spiking self-attention mechanisms fail to incorporate proper scaling factors for I-LIF neurons, which may lead to gradient vanishing. To address this, we propose a Range-Adaptive Spiking Attention for intra-scale interactions. By dynamically adjusting scaling coefficients, RASA mitigates gradient vanishing issues associated with integer training, allowing I-LIF neurons to exploit the benefits of spike-based self-attention fully. The proposed method achieves 67.7% mAP@50 on the COCO dataset and 68.6% mAP@50 on the Gen1 dataset, outperforming state-of-the-art YOLO architectures and achieving superior energy efficiency compared to advanced Transformer-based architectures. Code: <span><span>https://github.com/s1110/Spiking-Trans-YOLO</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"645 ","pages":"Article 130407"},"PeriodicalIF":5.5000,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225010793","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The remarkable success of Transformers in Artificial Neural Networks (ANNs) has driven growing interest in leveraging self-attention mechanisms and Transformer-based architectures for Spiking Neural Network (SNN) object detection. However, existing methods combining Transformers with YOLO lack reasonable bridging strategies, leading to significantly increased computational costs and limitations in local feature extraction. To address these challenges, we propose Spiking Trans-YOLO, introducing a Top-Attention Hybrid Feature Fusion module, which exclusively applies self-attention to high-level spiking features that are more stable and semantically meaningful. This prevents redundant computations caused by unstable low-level spikes and reduces energy consumption. Subsequently, we perform cross-scale feature fusion to compensate for Transformers’ shortcomings in local feature extraction. This approach efficiently bridges YOLO and Transformer architectures while preserving the low-power characteristics of SNNs. Additionally, The newly proposed Integer Leaky Integrate-and-Fire (I-LIF) neuron has demonstrated significant potential in SNNs by enabling integer-valued training and spike-driven inference, thereby reducing quantization errors. However, existing spiking self-attention mechanisms fail to incorporate proper scaling factors for I-LIF neurons, which may lead to gradient vanishing. To address this, we propose a Range-Adaptive Spiking Attention for intra-scale interactions. By dynamically adjusting scaling coefficients, RASA mitigates gradient vanishing issues associated with integer training, allowing I-LIF neurons to exploit the benefits of spike-based self-attention fully. The proposed method achieves 67.7% mAP@50 on the COCO dataset and 68.6% mAP@50 on the Gen1 dataset, outperforming state-of-the-art YOLO architectures and achieving superior energy efficiency compared to advanced Transformer-based architectures. Code: https://github.com/s1110/Spiking-Trans-YOLO.

查看原文本刊更多论文

脉冲跨YOLO：在YOLO和变压器之间的一个范围自适应的节能桥

变压器在人工神经网络（ann）中的显著成功推动了人们对利用自注意机制和基于变压器的架构进行峰值神经网络（SNN）目标检测的兴趣。然而，现有的变压器与YOLO相结合的方法缺乏合理的桥接策略，导致计算成本显著增加，并且在局部特征提取方面存在局限性。为了应对这些挑战，我们提出了Spiking Trans-YOLO，引入了一个Top-Attention Hybrid Feature Fusion模块，该模块专门将自关注应用于更稳定、语义更有意义的高级Spiking特征。这可以防止由不稳定的低电平尖峰引起的冗余计算，并降低能耗。随后，我们进行了跨尺度特征融合，以弥补transformer在局部特征提取方面的不足。这种方法有效地桥接了YOLO和Transformer架构，同时保持了snn的低功耗特性。此外，新提出的整数泄漏集成和激活（i - liff）神经元通过实现整数值训练和峰值驱动推理，从而减少量化误差，在snn中显示出巨大的潜力。然而，现有的尖峰自注意机制未能为I-LIF神经元纳入适当的缩放因子，这可能导致梯度消失。为了解决这个问题，我们提出了一个范围自适应的尺度内相互作用的峰值注意。通过动态调整缩放系数，RASA减轻了与整数训练相关的梯度消失问题，使I-LIF神经元能够充分利用基于峰值的自关注的好处。所提出的方法在COCO数据集上达到67.7% mAP@50，在Gen1数据集上达到68.6% mAP@50，优于最先进的YOLO架构，与先进的基于transformer的架构相比，实现了卓越的能源效率。代码:https://github.com/s1110/Spiking-Trans-YOLO。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.