Event-Based Video Reconstruction With Deep Spatial-Frequency Unfolding Network

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-03-17 DOI:10.1109/TIP.2025.3550008

Chengjie Ge;Xueyang Fu;Kunyu Wang;Zheng-Jun Zha

{"title":"Event-Based Video Reconstruction With Deep Spatial-Frequency Unfolding Network","authors":"Chengjie Ge;Xueyang Fu;Kunyu Wang;Zheng-Jun Zha","doi":"10.1109/TIP.2025.3550008","DOIUrl":null,"url":null,"abstract":"Current event-based video reconstruction methods, limited to the spatial domain, face challenges in decoupling brightness and structural information, leading to exposure distortion, and in efficiently acquiring non-local information without relying on computationally expensive Transformer models. To address these issues, we propose the Deep Spatial-Frequency Unfolding Reconstruction Network (DSFURNet), which explores and utilizes knowledge in the frequency domain for event-based video reconstruction. Specifically, we construct a variational model and propose three regularization terms: a brightness regularization term approximated by Fourier amplitudes, a structural regularization term approximated by Fourier phases, and an initialization regularization term that converts event representations into initial video frames. Then, we design corresponding spatial-frequency domain approximation operators for each regularization term. Benefiting from the global nature of computations in the frequency domain, the designed approximation operators can integrate local spatial and global frequency information at a lower computational cost. Furthermore, we combine the learned knowledge of the three regularization terms and unfold the optimization algorithm into an iterative deep network. Through this approach, the pixel-level initialization regularization constraint and the frequency domain brightness and structural regularization constraints can continuously play a role during the testing process, achieving a gradual improvement in the quality of the reconstructed video frames. Compared to existing methods, our network significantly reduces the number of network parameters while improving evaluation metrics.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1779-1794"},"PeriodicalIF":13.7000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10930616/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Current event-based video reconstruction methods, limited to the spatial domain, face challenges in decoupling brightness and structural information, leading to exposure distortion, and in efficiently acquiring non-local information without relying on computationally expensive Transformer models. To address these issues, we propose the Deep Spatial-Frequency Unfolding Reconstruction Network (DSFURNet), which explores and utilizes knowledge in the frequency domain for event-based video reconstruction. Specifically, we construct a variational model and propose three regularization terms: a brightness regularization term approximated by Fourier amplitudes, a structural regularization term approximated by Fourier phases, and an initialization regularization term that converts event representations into initial video frames. Then, we design corresponding spatial-frequency domain approximation operators for each regularization term. Benefiting from the global nature of computations in the frequency domain, the designed approximation operators can integrate local spatial and global frequency information at a lower computational cost. Furthermore, we combine the learned knowledge of the three regularization terms and unfold the optimization algorithm into an iterative deep network. Through this approach, the pixel-level initialization regularization constraint and the frequency domain brightness and structural regularization constraints can continuously play a role during the testing process, achieving a gradual improvement in the quality of the reconstructed video frames. Compared to existing methods, our network significantly reduces the number of network parameters while improving evaluation metrics.

查看原文本刊更多论文

基于事件的深度空频展开网络视频重构

当前基于事件的视频重建方法局限于空间域，面临着亮度和结构信息解耦导致曝光失真的挑战，以及在不依赖计算成本高昂的Transformer模型的情况下有效获取非局部信息的挑战。为了解决这些问题，我们提出了深度空间-频率展开重建网络（DSFURNet），该网络探索并利用频域知识进行基于事件的视频重建。具体来说，我们构建了一个变分模型，并提出了三个正则化项：一个由傅立叶振幅近似的亮度正则化项，一个由傅立叶相位近似的结构正则化项，以及一个将事件表示转换为初始视频帧的初始化正则化项。然后，对每个正则化项设计相应的空频域近似算子。利用频域计算的全局性，所设计的近似算子能够以较低的计算成本整合局部空间和全局频率信息。此外，我们将学习到的三个正则化项的知识结合起来，将优化算法展开为一个迭代的深度网络。通过这种方法，像素级初始化正则化约束和频域亮度和结构正则化约束可以在测试过程中不断发挥作用，实现重构视频帧质量的逐步提高。与现有方法相比，我们的网络显著减少了网络参数的数量，同时改进了评估指标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量