Event-Based Video Reconstruction Via Spatial–Temporal Heterogeneous Spiking Neural Network

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-13 DOI:10.1109/TCSVT.2025.3550901

Jiajie Yu;Xing Lu;Lijun Guo;Chong Wang;Guoqi Li;Jiangbo Qian

{"title":"Event-Based Video Reconstruction Via Spatial–Temporal Heterogeneous Spiking Neural Network","authors":"Jiajie Yu;Xing Lu;Lijun Guo;Chong Wang;Guoqi Li;Jiangbo Qian","doi":"10.1109/TCSVT.2025.3550901","DOIUrl":null,"url":null,"abstract":"Event cameras detect per-pixel brightness changes and output asynchronous event streams with high temporal resolution, high dynamic range, and low latency. However, the unstructured nature of event streams means that humans cannot analyze and interpret them in the same way as natural images. Event-based video reconstruction is a widely used method aimed at reconstructing intuitive videos from event streams. Most reconstruction methods based on traditional artificial neural networks (ANNs) have high energy consumption, which counteracts the low-power advantage of event cameras. Spiking neural networks (SNNs) are a new generation of event-driven neural networks that encode information via discrete spikes, which leads to greater computational efficiency. Previous methods based on SNNs overlooked the asynchronous nature of event streams, leading to reconstructions that suffer from artifacts, flickering, low contrast, etc. In this work, we analyze event streams and spiking neurons and explain poor reconstruction quality. We specifically propose a novel spatial-temporal heterogeneous (STH) spiking neuron suitable for reconstructing asynchronous event streams. The STH neuron adjusts the membrane decay coefficient adaptively and has better spatiotemporal perception. In addition, we propose a temporal-frequency calibration module (TFCM) based on the Fourier transform to improve the contrast of the reconstructions. On the basis of the above proposed neuron and module, we construct two SNN-based models, referred to as the STHSNN and TFCSNN. The goal of the former is to reduce the artifacts and flickering in reconstructions, whereas the latter focuses on enhancing the contrast. The experimental results demonstrate that our models can yield reconstructions in various scenarios, achieving better quality and lower energy consumption than previous SNNs. Specifically, the TFCSNN and STHSNN achieve top-2 performance among the SNN-based models, with energy consumption reductions of 3.48 times and 12.40 times, respectively.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8478-8494"},"PeriodicalIF":11.1000,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10925495/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Event cameras detect per-pixel brightness changes and output asynchronous event streams with high temporal resolution, high dynamic range, and low latency. However, the unstructured nature of event streams means that humans cannot analyze and interpret them in the same way as natural images. Event-based video reconstruction is a widely used method aimed at reconstructing intuitive videos from event streams. Most reconstruction methods based on traditional artificial neural networks (ANNs) have high energy consumption, which counteracts the low-power advantage of event cameras. Spiking neural networks (SNNs) are a new generation of event-driven neural networks that encode information via discrete spikes, which leads to greater computational efficiency. Previous methods based on SNNs overlooked the asynchronous nature of event streams, leading to reconstructions that suffer from artifacts, flickering, low contrast, etc. In this work, we analyze event streams and spiking neurons and explain poor reconstruction quality. We specifically propose a novel spatial-temporal heterogeneous (STH) spiking neuron suitable for reconstructing asynchronous event streams. The STH neuron adjusts the membrane decay coefficient adaptively and has better spatiotemporal perception. In addition, we propose a temporal-frequency calibration module (TFCM) based on the Fourier transform to improve the contrast of the reconstructions. On the basis of the above proposed neuron and module, we construct two SNN-based models, referred to as the STHSNN and TFCSNN. The goal of the former is to reduce the artifacts and flickering in reconstructions, whereas the latter focuses on enhancing the contrast. The experimental results demonstrate that our models can yield reconstructions in various scenarios, achieving better quality and lower energy consumption than previous SNNs. Specifically, the TFCSNN and STHSNN achieve top-2 performance among the SNN-based models, with energy consumption reductions of 3.48 times and 12.40 times, respectively.

查看原文本刊更多论文

基于时空异质脉冲神经网络的事件视频重构

事件相机检测每个像素的亮度变化，并输出具有高时间分辨率、高动态范围和低延迟的异步事件流。然而，事件流的非结构化本质意味着人类无法像分析自然图像那样分析和解释它们。基于事件的视频重构是一种广泛使用的从事件流中重构直观视频的方法。传统的基于人工神经网络（ann）的重建方法大多存在高能耗的问题，抵消了事件摄像机的低功耗优势。尖峰神经网络（snn）是新一代事件驱动的神经网络，它通过离散尖峰对信息进行编码，从而提高了计算效率。以前基于snn的方法忽略了事件流的异步特性，导致重建受到伪影、闪烁、低对比度等问题的困扰。在这项工作中，我们分析了事件流和尖峰神经元，并解释了重建质量差的原因。我们特别提出了一种适合重构异步事件流的新型时空异构（STH）峰值神经元。STH神经元自适应调节膜衰减系数，具有较好的时空感知能力。此外，我们提出了一种基于傅里叶变换的时间-频率校准模块（TFCM），以提高重建图像的对比度。在上述提出的神经元和模块的基础上，我们构建了两个基于snn的模型，称为STHSNN和TFCSNN。前者的目标是减少重建中的伪影和闪烁，而后者的目标是增强对比度。实验结果表明，我们的模型可以在各种场景下产生重建，比以前的snn获得更好的质量和更低的能耗。其中，TFCSNN和STHSNN在基于snn的模型中性能排名前2，能耗分别降低了3.48倍和12.40倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.