{"title":"Self-supervised Learning of Event-guided Video Frame Interpolation for Rolling Shutter Frames.","authors":"Yunfan Lu, Guoqiang Liang, Yiran Shen, Lin Wang","doi":"10.1109/TVCG.2025.3576305","DOIUrl":null,"url":null,"abstract":"<p><p>Most consumer cameras use rolling shutter (RS) exposure, the captured videos often suffer from distortions (e.g., skew and jelly effect). Also, these videos are impeded by the limited bandwidth and frame rate, which inevitably affect the video streaming experience. In this paper, we excavate the potential of event cameras as they enjoy high temporal resolution. Accordingly, we propose a framework to recover the global shutter (GS) high frame rate (i.e., slow motion) video without RS distortion from an RS camera and event camera. One challenge is the lack of real-world datasets for supervised training. Therefore, we explore self-supervised learning with the key idea of estimating the displacement field-a non-linear and dense 3D spatiotemporal representation of all pixels during the exposure time. This allows for a mutual reconstruction between RS and GS frames and facilitates slow-motion video recovery. We then combine the input RS frames with the DF to map them to the GS frames (RS-to-GS). Given the under-constrained nature of this mapping, we integrate it with the inverse mapping (GS-to-RS) and RS frame warping (RS-to-RS) for self-supervision. We evaluate our framework via objective analysis (i.e., quantitative and qualitative comparisons on four datasets) and subjective studies (i.e., user study). The results show that our framework can recover slow-motion videos without distortion, with much lower bandwidth ($94\\%$ drop) and higher inference speed ($16ms/frame$) under $32 \\times$ frame interpolation. The dataset and source code are publicly available at: https://github.com/yunfanLu/Self-EvRSVFI.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":6.5000,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TVCG.2025.3576305","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Most consumer cameras use rolling shutter (RS) exposure, the captured videos often suffer from distortions (e.g., skew and jelly effect). Also, these videos are impeded by the limited bandwidth and frame rate, which inevitably affect the video streaming experience. In this paper, we excavate the potential of event cameras as they enjoy high temporal resolution. Accordingly, we propose a framework to recover the global shutter (GS) high frame rate (i.e., slow motion) video without RS distortion from an RS camera and event camera. One challenge is the lack of real-world datasets for supervised training. Therefore, we explore self-supervised learning with the key idea of estimating the displacement field-a non-linear and dense 3D spatiotemporal representation of all pixels during the exposure time. This allows for a mutual reconstruction between RS and GS frames and facilitates slow-motion video recovery. We then combine the input RS frames with the DF to map them to the GS frames (RS-to-GS). Given the under-constrained nature of this mapping, we integrate it with the inverse mapping (GS-to-RS) and RS frame warping (RS-to-RS) for self-supervision. We evaluate our framework via objective analysis (i.e., quantitative and qualitative comparisons on four datasets) and subjective studies (i.e., user study). The results show that our framework can recover slow-motion videos without distortion, with much lower bandwidth ($94\%$ drop) and higher inference speed ($16ms/frame$) under $32 \times$ frame interpolation. The dataset and source code are publicly available at: https://github.com/yunfanLu/Self-EvRSVFI.