{"title":"MMA: Video Reconstruction for Spike Camera Based on Multiscale Temporal Modeling and Fine-Grained Attention","authors":"Dilmurat Alim;Chen Yang;Laiyun Qing;Guorong Li;Qingming Huang","doi":"10.1109/LSP.2025.3550278","DOIUrl":null,"url":null,"abstract":"This paper presents a Multiscale Temporal Correlation Learning with the Mamba-Fused Attention Model (MMA), an efficient and effective method for reconstructing a video clip from a spike stream. Spike cameras offer unique advantages for capturing rapid scene changes with high temporal resolution. A spike stream contains sufficient information for multiple image reconstructions. However, existing methods generate only a single image at a time for a given spike stream, which results in excessive redundant computations between consecutive frames when aiming at restoring a video clip, thereby increasing computational costs significantly. The proposed MMA addresses such challenges by constructing a spike-to-video model, directly producing an image sequence at a time. Specifically, we propose a U-shaped Multiscale Temporal Correlation Learning (MTCL) to fuse the features at different temporal resolutions for clear video reconstruction. At each scale, we introduce a Fine-Grained Attention (FGA) module for fine-spatial context modeling within a patch and a Mamba module for integrating features across patches. Adopting a lightweight U-shaped structure and fine-grained feature extraction at each level, our method reconstructs high-quality image sequences quickly. The experimental results show that the proposed MMA surpasses current state-of-the-art methods in image quality, computation cost, and model size.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1291-1295"},"PeriodicalIF":3.2000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10921653/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents a Multiscale Temporal Correlation Learning with the Mamba-Fused Attention Model (MMA), an efficient and effective method for reconstructing a video clip from a spike stream. Spike cameras offer unique advantages for capturing rapid scene changes with high temporal resolution. A spike stream contains sufficient information for multiple image reconstructions. However, existing methods generate only a single image at a time for a given spike stream, which results in excessive redundant computations between consecutive frames when aiming at restoring a video clip, thereby increasing computational costs significantly. The proposed MMA addresses such challenges by constructing a spike-to-video model, directly producing an image sequence at a time. Specifically, we propose a U-shaped Multiscale Temporal Correlation Learning (MTCL) to fuse the features at different temporal resolutions for clear video reconstruction. At each scale, we introduce a Fine-Grained Attention (FGA) module for fine-spatial context modeling within a patch and a Mamba module for integrating features across patches. Adopting a lightweight U-shaped structure and fine-grained feature extraction at each level, our method reconstructs high-quality image sequences quickly. The experimental results show that the proposed MMA surpasses current state-of-the-art methods in image quality, computation cost, and model size.
期刊介绍:
The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.