MMA: Video Reconstruction for Spike Camera Based on Multiscale Temporal Modeling and Fine-Grained Attention

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters Pub Date : 2025-03-11 DOI:10.1109/LSP.2025.3550278

Dilmurat Alim;Chen Yang;Laiyun Qing;Guorong Li;Qingming Huang

{"title":"MMA: Video Reconstruction for Spike Camera Based on Multiscale Temporal Modeling and Fine-Grained Attention","authors":"Dilmurat Alim;Chen Yang;Laiyun Qing;Guorong Li;Qingming Huang","doi":"10.1109/LSP.2025.3550278","DOIUrl":null,"url":null,"abstract":"This paper presents a Multiscale Temporal Correlation Learning with the Mamba-Fused Attention Model (MMA), an efficient and effective method for reconstructing a video clip from a spike stream. Spike cameras offer unique advantages for capturing rapid scene changes with high temporal resolution. A spike stream contains sufficient information for multiple image reconstructions. However, existing methods generate only a single image at a time for a given spike stream, which results in excessive redundant computations between consecutive frames when aiming at restoring a video clip, thereby increasing computational costs significantly. The proposed MMA addresses such challenges by constructing a spike-to-video model, directly producing an image sequence at a time. Specifically, we propose a U-shaped Multiscale Temporal Correlation Learning (MTCL) to fuse the features at different temporal resolutions for clear video reconstruction. At each scale, we introduce a Fine-Grained Attention (FGA) module for fine-spatial context modeling within a patch and a Mamba module for integrating features across patches. Adopting a lightweight U-shaped structure and fine-grained feature extraction at each level, our method reconstructs high-quality image sequences quickly. The experimental results show that the proposed MMA surpasses current state-of-the-art methods in image quality, computation cost, and model size.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1291-1295"},"PeriodicalIF":3.2000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10921653/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

This paper presents a Multiscale Temporal Correlation Learning with the Mamba-Fused Attention Model (MMA), an efficient and effective method for reconstructing a video clip from a spike stream. Spike cameras offer unique advantages for capturing rapid scene changes with high temporal resolution. A spike stream contains sufficient information for multiple image reconstructions. However, existing methods generate only a single image at a time for a given spike stream, which results in excessive redundant computations between consecutive frames when aiming at restoring a video clip, thereby increasing computational costs significantly. The proposed MMA addresses such challenges by constructing a spike-to-video model, directly producing an image sequence at a time. Specifically, we propose a U-shaped Multiscale Temporal Correlation Learning (MTCL) to fuse the features at different temporal resolutions for clear video reconstruction. At each scale, we introduce a Fine-Grained Attention (FGA) module for fine-spatial context modeling within a patch and a Mamba module for integrating features across patches. Adopting a lightweight U-shaped structure and fine-grained feature extraction at each level, our method reconstructs high-quality image sequences quickly. The experimental results show that the proposed MMA surpasses current state-of-the-art methods in image quality, computation cost, and model size.

查看原文本刊更多论文

基于多尺度时间建模和细粒度注意力的长钉摄像机视频重构

本文提出了一种基于Mamba-Fused Attention Model （MMA）的多尺度时间相关学习方法，这是一种从尖峰流中重构视频片段的有效方法。长钉相机为捕捉高时间分辨率的快速场景变化提供了独特的优势。一个尖峰流包含了多次图像重建所需的足够信息。然而，现有的方法对于给定的尖峰流一次只生成一个图像，这导致在旨在恢复视频片段的连续帧之间产生过多的冗余计算，从而大大增加了计算成本。提出的MMA解决了这些挑战，通过构建一个峰值到视频的模型，一次直接产生一个图像序列。具体来说，我们提出了一种u型多尺度时间相关学习（MTCL）来融合不同时间分辨率下的特征，以实现清晰的视频重建。在每个尺度上，我们引入了一个细粒度注意力（FGA）模块，用于在斑块内进行精细空间上下文建模，并引入了一个曼巴模块，用于整合斑块间的特征。该方法采用轻量级的u型结构，并在每一层进行细粒度特征提取，快速重建出高质量的图像序列。实验结果表明，所提出的MMA在图像质量、计算成本和模型尺寸方面都优于目前最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Signal Processing Letters 工程技术-工程：电子与电气

CiteScore

7.40

自引率

12.80%

发文量

339

审稿时长

2.8 months

期刊介绍： The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.