Yi Jin;Xiaoxiao Ma;Rui Zhang;Huaian Chen;Yuxuan Gu;Pengyang Ling;Enhong Chen
{"title":"Masked Video Pretraining Advances Real-World Video Denoising","authors":"Yi Jin;Xiaoxiao Ma;Rui Zhang;Huaian Chen;Yuxuan Gu;Pengyang Ling;Enhong Chen","doi":"10.1109/TMM.2024.3521818","DOIUrl":null,"url":null,"abstract":"Learning-based video denoisers have attained state-of-the-art (SOTA) performances on public evaluation benchmarks. Nevertheless, they typically encounter significant performance drops when applied to unseen real-world data, owing to inherent data discrepancies. To address this problem, this work delves into the model pretraining techniques and proposes masked central frame modeling (MCFM), a new video pretraining approach that significantly improves the generalization ability of the denoiser. This proposal stems from a key observation: pretraining denoiser by reconstructing intact videos from the corrupted sequences, where the central frames are masked at a suitable probability, contributes to achieving superior performance on real-world data. Building upon MCFM, we introduce a robust video denoiser, named MVDenoiser, which is firstly pretrained on massive available ordinary videos for general video modeling, and then finetuned on costful real-world noisy/clean video pairs for noisy-to-clean mapping. Additionally, beyond the denoising model, we further establish a new paired real-world noisy video dataset (RNVD) to facilitate cross-dataset evaluation of generalization ability. Extensive experiments conducted across different datasets demonstrate that the proposed method achieves superior performance compared to existing methods.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"622-636"},"PeriodicalIF":8.4000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10836851/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Masked Video Pretraining Advances Real-World Video Denoising
Learning-based video denoisers have attained state-of-the-art (SOTA) performances on public evaluation benchmarks. Nevertheless, they typically encounter significant performance drops when applied to unseen real-world data, owing to inherent data discrepancies. To address this problem, this work delves into the model pretraining techniques and proposes masked central frame modeling (MCFM), a new video pretraining approach that significantly improves the generalization ability of the denoiser. This proposal stems from a key observation: pretraining denoiser by reconstructing intact videos from the corrupted sequences, where the central frames are masked at a suitable probability, contributes to achieving superior performance on real-world data. Building upon MCFM, we introduce a robust video denoiser, named MVDenoiser, which is firstly pretrained on massive available ordinary videos for general video modeling, and then finetuned on costful real-world noisy/clean video pairs for noisy-to-clean mapping. Additionally, beyond the denoising model, we further establish a new paired real-world noisy video dataset (RNVD) to facilitate cross-dataset evaluation of generalization ability. Extensive experiments conducted across different datasets demonstrate that the proposed method achieves superior performance compared to existing methods.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.