{"title":"Cascaded UNet for progressive noise residual prediction for structure-preserving video denoising","authors":"Abhijeet Pimpale, Kishor Bhurchandi","doi":"10.1016/j.cviu.2024.104103","DOIUrl":null,"url":null,"abstract":"<div><p>The prominence of high-quality video services has become so substantial that by 2030, it is estimated that approximately 80% of internet traffic will consist of videos. On the contrary, video denoising remains a relatively unexplored and intricate field, presenting more substantial challenges compared to image denoising. Many published deep learning video denoising algorithms typically rely on simple, efficient single encoder–decoder networks, but they have inherent limitations in preserving intricate image details and effectively managing noise information propagation for noise residue modelling. In response to these challenges, the proposed work introduces an innovative approach; in terms of utilization of cascaded UNets for progressive noise residual prediction in video denoising. This multi-stage encoder–decoder architecture is meticulously designed to accurately predict noise residual maps, thereby preserving the locally fine details within video content as represented by SSIM. The proposed network has undergone extensive end-to-end training from scratch without explicit motion compensation to reduce complexity. In terms of the more rigorous SSIM metric, the proposed network outperformed all video denoising methods while maintaining a comparable PSNR.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S107731422400184X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The prominence of high-quality video services has become so substantial that by 2030, it is estimated that approximately 80% of internet traffic will consist of videos. On the contrary, video denoising remains a relatively unexplored and intricate field, presenting more substantial challenges compared to image denoising. Many published deep learning video denoising algorithms typically rely on simple, efficient single encoder–decoder networks, but they have inherent limitations in preserving intricate image details and effectively managing noise information propagation for noise residue modelling. In response to these challenges, the proposed work introduces an innovative approach; in terms of utilization of cascaded UNets for progressive noise residual prediction in video denoising. This multi-stage encoder–decoder architecture is meticulously designed to accurately predict noise residual maps, thereby preserving the locally fine details within video content as represented by SSIM. The proposed network has undergone extensive end-to-end training from scratch without explicit motion compensation to reduce complexity. In terms of the more rigorous SSIM metric, the proposed network outperformed all video denoising methods while maintaining a comparable PSNR.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems