{"title":"Event-Based Video Reconstruction Via Spatial–Temporal Heterogeneous Spiking Neural Network","authors":"Jiajie Yu;Xing Lu;Lijun Guo;Chong Wang;Guoqi Li;Jiangbo Qian","doi":"10.1109/TCSVT.2025.3550901","DOIUrl":null,"url":null,"abstract":"Event cameras detect per-pixel brightness changes and output asynchronous event streams with high temporal resolution, high dynamic range, and low latency. However, the unstructured nature of event streams means that humans cannot analyze and interpret them in the same way as natural images. Event-based video reconstruction is a widely used method aimed at reconstructing intuitive videos from event streams. Most reconstruction methods based on traditional artificial neural networks (ANNs) have high energy consumption, which counteracts the low-power advantage of event cameras. Spiking neural networks (SNNs) are a new generation of event-driven neural networks that encode information via discrete spikes, which leads to greater computational efficiency. Previous methods based on SNNs overlooked the asynchronous nature of event streams, leading to reconstructions that suffer from artifacts, flickering, low contrast, etc. In this work, we analyze event streams and spiking neurons and explain poor reconstruction quality. We specifically propose a novel spatial-temporal heterogeneous (STH) spiking neuron suitable for reconstructing asynchronous event streams. The STH neuron adjusts the membrane decay coefficient adaptively and has better spatiotemporal perception. In addition, we propose a temporal-frequency calibration module (TFCM) based on the Fourier transform to improve the contrast of the reconstructions. On the basis of the above proposed neuron and module, we construct two SNN-based models, referred to as the STHSNN and TFCSNN. The goal of the former is to reduce the artifacts and flickering in reconstructions, whereas the latter focuses on enhancing the contrast. The experimental results demonstrate that our models can yield reconstructions in various scenarios, achieving better quality and lower energy consumption than previous SNNs. Specifically, the TFCSNN and STHSNN achieve top-2 performance among the SNN-based models, with energy consumption reductions of 3.48 times and 12.40 times, respectively.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8478-8494"},"PeriodicalIF":11.1000,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10925495/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Event cameras detect per-pixel brightness changes and output asynchronous event streams with high temporal resolution, high dynamic range, and low latency. However, the unstructured nature of event streams means that humans cannot analyze and interpret them in the same way as natural images. Event-based video reconstruction is a widely used method aimed at reconstructing intuitive videos from event streams. Most reconstruction methods based on traditional artificial neural networks (ANNs) have high energy consumption, which counteracts the low-power advantage of event cameras. Spiking neural networks (SNNs) are a new generation of event-driven neural networks that encode information via discrete spikes, which leads to greater computational efficiency. Previous methods based on SNNs overlooked the asynchronous nature of event streams, leading to reconstructions that suffer from artifacts, flickering, low contrast, etc. In this work, we analyze event streams and spiking neurons and explain poor reconstruction quality. We specifically propose a novel spatial-temporal heterogeneous (STH) spiking neuron suitable for reconstructing asynchronous event streams. The STH neuron adjusts the membrane decay coefficient adaptively and has better spatiotemporal perception. In addition, we propose a temporal-frequency calibration module (TFCM) based on the Fourier transform to improve the contrast of the reconstructions. On the basis of the above proposed neuron and module, we construct two SNN-based models, referred to as the STHSNN and TFCSNN. The goal of the former is to reduce the artifacts and flickering in reconstructions, whereas the latter focuses on enhancing the contrast. The experimental results demonstrate that our models can yield reconstructions in various scenarios, achieving better quality and lower energy consumption than previous SNNs. Specifically, the TFCSNN and STHSNN achieve top-2 performance among the SNN-based models, with energy consumption reductions of 3.48 times and 12.40 times, respectively.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.