Self-supervised Learning of Event-guided Video Frame Interpolation for Rolling Shutter Frames.

IF 6.5

IEEE transactions on visualization and computer graphics Pub Date : 2025-06-03 DOI:10.1109/TVCG.2025.3576305

Yunfan Lu, Guoqiang Liang, Yiran Shen, Lin Wang

{"title":"Self-supervised Learning of Event-guided Video Frame Interpolation for Rolling Shutter Frames.","authors":"Yunfan Lu, Guoqiang Liang, Yiran Shen, Lin Wang","doi":"10.1109/TVCG.2025.3576305","DOIUrl":null,"url":null,"abstract":"<p><p>Most consumer cameras use rolling shutter (RS) exposure, the captured videos often suffer from distortions (e.g., skew and jelly effect). Also, these videos are impeded by the limited bandwidth and frame rate, which inevitably affect the video streaming experience. In this paper, we excavate the potential of event cameras as they enjoy high temporal resolution. Accordingly, we propose a framework to recover the global shutter (GS) high frame rate (i.e., slow motion) video without RS distortion from an RS camera and event camera. One challenge is the lack of real-world datasets for supervised training. Therefore, we explore self-supervised learning with the key idea of estimating the displacement field-a non-linear and dense 3D spatiotemporal representation of all pixels during the exposure time. This allows for a mutual reconstruction between RS and GS frames and facilitates slow-motion video recovery. We then combine the input RS frames with the DF to map them to the GS frames (RS-to-GS). Given the under-constrained nature of this mapping, we integrate it with the inverse mapping (GS-to-RS) and RS frame warping (RS-to-RS) for self-supervision. We evaluate our framework via objective analysis (i.e., quantitative and qualitative comparisons on four datasets) and subjective studies (i.e., user study). The results show that our framework can recover slow-motion videos without distortion, with much lower bandwidth ($94\\%$ drop) and higher inference speed ($16ms/frame$) under $32 \\times$ frame interpolation. The dataset and source code are publicly available at: https://github.com/yunfanLu/Self-EvRSVFI.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":6.5000,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TVCG.2025.3576305","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Most consumer cameras use rolling shutter (RS) exposure, the captured videos often suffer from distortions (e.g., skew and jelly effect). Also, these videos are impeded by the limited bandwidth and frame rate, which inevitably affect the video streaming experience. In this paper, we excavate the potential of event cameras as they enjoy high temporal resolution. Accordingly, we propose a framework to recover the global shutter (GS) high frame rate (i.e., slow motion) video without RS distortion from an RS camera and event camera. One challenge is the lack of real-world datasets for supervised training. Therefore, we explore self-supervised learning with the key idea of estimating the displacement field-a non-linear and dense 3D spatiotemporal representation of all pixels during the exposure time. This allows for a mutual reconstruction between RS and GS frames and facilitates slow-motion video recovery. We then combine the input RS frames with the DF to map them to the GS frames (RS-to-GS). Given the under-constrained nature of this mapping, we integrate it with the inverse mapping (GS-to-RS) and RS frame warping (RS-to-RS) for self-supervision. We evaluate our framework via objective analysis (i.e., quantitative and qualitative comparisons on four datasets) and subjective studies (i.e., user study). The results show that our framework can recover slow-motion videos without distortion, with much lower bandwidth ($94\%$ drop) and higher inference speed ($16ms/frame$) under $32 \times$ frame interpolation. The dataset and source code are publicly available at: https://github.com/yunfanLu/Self-EvRSVFI.

查看原文本刊更多论文

滚动快门帧的事件引导视频帧插值自监督学习。

大多数消费类相机使用滚动快门（RS）曝光，拍摄的视频经常遭受失真（例如，倾斜和果冻效应）。此外，这些视频受到有限的带宽和帧率的阻碍，这不可避免地影响了视频流体验。在本文中，我们挖掘事件相机的潜力，因为它们具有高时间分辨率。因此，我们提出了一个框架，以恢复全局快门（GS）高帧率（即慢动作）视频无RS失真从RS相机和事件相机。一个挑战是缺乏监督训练的真实数据集。因此，我们探索了自监督学习，其关键思想是估计位移场-在曝光时间内所有像素的非线性和密集的三维时空表示。这允许RS和GS帧之间的相互重建，并促进慢动作视频恢复。然后，我们将输入的RS帧与DF结合起来，将它们映射到GS帧（RS-to-GS）。考虑到该映射的欠约束性质，我们将其与逆映射（GS-to-RS）和RS帧翘曲（RS-to-RS）集成以进行自我监督。我们通过客观分析（即对四个数据集进行定量和定性比较）和主观研究（即用户研究）来评估我们的框架。结果表明，我们的框架可以在没有失真的情况下恢复慢动作视频，在32次帧插值下具有更低的带宽（$94 %$下降）和更高的推理速度（$16ms/帧$）。数据集和源代码可在https://github.com/yunfanLu/Self-EvRSVFI公开获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on visualization and computer graphics

自引率

0.00%

发文量