{"title":"Self-Supervised Learning for Rolling Shutter Temporal Super-Resolution","authors":"Bin Fan;Ying Guo;Yuchao Dai;Chao Xu;Boxin Shi","doi":"10.1109/TCSVT.2024.3462520","DOIUrl":null,"url":null,"abstract":"Most cameras on portable devices adopt a rolling shutter (RS) mechanism, encoding sufficient temporal dynamic information through sequential readouts. This advantage can be exploited to recover a temporal sequence of latent global shutter (GS) images. Existing methods rely on fully supervised learning, necessitating specialized optical devices to collect paired RS-GS images as ground-truth, which is too costly to scale. In this paper, we propose a self-supervised learning framework for the first time to produce a high frame rate GS video from two consecutive RS images, unleashing the potential of RS cameras. Specifically, we first develop the unified warping model of RS2GS and GS2RS, enabling the complement conversions of RS2GS and GS2RS to be incorporated into a uniform network model. Then, based on the cycle consistency constraint, given a triplet of consecutive RS frames, we minimize the discrepancy between the input middle RS frame and its cycle reconstruction, generated by interpolating back from the predicted two intermediate GS frames. Experiments on various benchmarks show that our approach achieves comparable or better performance than state-of-the-art supervised methods while enjoying stronger generalization capabilities. Moreover, our approach makes it possible to recover smooth and distortion-free videos from two adjacent RS frames in the real-world BS-RSC dataset, surpassing prior limitations.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 1","pages":"769-782"},"PeriodicalIF":8.3000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10681568/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Most cameras on portable devices adopt a rolling shutter (RS) mechanism, encoding sufficient temporal dynamic information through sequential readouts. This advantage can be exploited to recover a temporal sequence of latent global shutter (GS) images. Existing methods rely on fully supervised learning, necessitating specialized optical devices to collect paired RS-GS images as ground-truth, which is too costly to scale. In this paper, we propose a self-supervised learning framework for the first time to produce a high frame rate GS video from two consecutive RS images, unleashing the potential of RS cameras. Specifically, we first develop the unified warping model of RS2GS and GS2RS, enabling the complement conversions of RS2GS and GS2RS to be incorporated into a uniform network model. Then, based on the cycle consistency constraint, given a triplet of consecutive RS frames, we minimize the discrepancy between the input middle RS frame and its cycle reconstruction, generated by interpolating back from the predicted two intermediate GS frames. Experiments on various benchmarks show that our approach achieves comparable or better performance than state-of-the-art supervised methods while enjoying stronger generalization capabilities. Moreover, our approach makes it possible to recover smooth and distortion-free videos from two adjacent RS frames in the real-world BS-RSC dataset, surpassing prior limitations.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.