S2DNet：一个使用单目视频的自监督训练网络

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Vision and Image Understanding Pub Date : 2025-07-10 DOI:10.1016/j.cviu.2025.104444

Aman Kumar, Aditya Mohan, A.N. Rajagopalan

{"title":"S2DNet：一个使用单目视频的自监督训练网络","authors":"Aman Kumar, Aditya Mohan, A.N. Rajagopalan","doi":"10.1016/j.cviu.2025.104444","DOIUrl":null,"url":null,"abstract":"<div><div>Rainy conditions degrade the visual quality of images, thus presenting significant challenges for various vision-based downstream tasks. Traditional deraining approaches often rely on supervised learning methods requiring large, paired datasets of rainy and clean images. However, due to the dynamic and complex nature of rain, compiling such datasets is challenging and often insufficient for training robust models. As a result, researchers often resort to synthetic datasets. However, synthetic datasets have limitations because they often lack realism, can introduce biases, and seldom capture the diversity of real rain scenes. We propose a self-supervised method for image deraining using monocular videos that leverages the fact that rain moves spatially across frames, independently of the static elements in a scene, thus enabling isolation of rain-affected regions. We utilize depth information from the target frame and the camera’s relative pose (translations and rotations) across frames to achieve scene alignment. We apply a view-synthesis constraint that warps features from adjacent frames to the target frame, which enables us to generate pseudo-ground truth images by selecting clean pixels from the warped frame. The pseudo-clean images thus generated are effectively leveraged by our network to remove rain from images in a self-supervised manner without the need for a real rain paired dataset which is difficult to capture. Extensive evaluations on diverse real-world rainy datasets demonstrate that our approach achieves state-of-the-art performance in real image deraining, outperforming existing unsupervised methods.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104444"},"PeriodicalIF":3.5000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"S2DNet: A self-supervised deraining network using monocular videos\",\"authors\":\"Aman Kumar, Aditya Mohan, A.N. Rajagopalan\",\"doi\":\"10.1016/j.cviu.2025.104444\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Rainy conditions degrade the visual quality of images, thus presenting significant challenges for various vision-based downstream tasks. Traditional deraining approaches often rely on supervised learning methods requiring large, paired datasets of rainy and clean images. However, due to the dynamic and complex nature of rain, compiling such datasets is challenging and often insufficient for training robust models. As a result, researchers often resort to synthetic datasets. However, synthetic datasets have limitations because they often lack realism, can introduce biases, and seldom capture the diversity of real rain scenes. We propose a self-supervised method for image deraining using monocular videos that leverages the fact that rain moves spatially across frames, independently of the static elements in a scene, thus enabling isolation of rain-affected regions. We utilize depth information from the target frame and the camera’s relative pose (translations and rotations) across frames to achieve scene alignment. We apply a view-synthesis constraint that warps features from adjacent frames to the target frame, which enables us to generate pseudo-ground truth images by selecting clean pixels from the warped frame. The pseudo-clean images thus generated are effectively leveraged by our network to remove rain from images in a self-supervised manner without the need for a real rain paired dataset which is difficult to capture. Extensive evaluations on diverse real-world rainy datasets demonstrate that our approach achieves state-of-the-art performance in real image deraining, outperforming existing unsupervised methods.</div></div>\",\"PeriodicalId\":50633,\"journal\":{\"name\":\"Computer Vision and Image Understanding\",\"volume\":\"259 \",\"pages\":\"Article 104444\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision and Image Understanding\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1077314225001675\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225001675","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

雨天会降低图像的视觉质量，因此对各种基于视觉的下游任务提出了重大挑战。传统的脱轨方法通常依赖于监督学习方法，需要大量的、成对的雨天和干净图像数据集。然而，由于雨的动态性和复杂性，编制这样的数据集是具有挑战性的，而且往往不足以训练健壮的模型。因此，研究人员经常求助于合成数据集。然而，合成数据集具有局限性，因为它们通常缺乏真实感，可能会引入偏差，并且很少捕捉到真实降雨场景的多样性。我们提出了一种使用单目视频进行图像脱轨的自监督方法，该方法利用了雨在空间上跨帧移动的事实，独立于场景中的静态元素，从而实现了受雨影响区域的隔离。我们利用目标帧的深度信息和相机的相对姿态（平移和旋转）跨帧来实现场景对齐。我们应用了一个视图合成约束，将相邻帧的特征扭曲到目标帧，这使我们能够通过从扭曲的帧中选择干净的像素来生成伪地面真实图像。由此生成的伪干净图像被我们的网络有效地利用，以自监督的方式从图像中去除雨水，而不需要难以捕获的真实雨水配对数据集。对各种真实雨天数据集的广泛评估表明，我们的方法在真实图像脱轨方面达到了最先进的性能，优于现有的无监督方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

S2DNet: A self-supervised deraining network using monocular videos

Rainy conditions degrade the visual quality of images, thus presenting significant challenges for various vision-based downstream tasks. Traditional deraining approaches often rely on supervised learning methods requiring large, paired datasets of rainy and clean images. However, due to the dynamic and complex nature of rain, compiling such datasets is challenging and often insufficient for training robust models. As a result, researchers often resort to synthetic datasets. However, synthetic datasets have limitations because they often lack realism, can introduce biases, and seldom capture the diversity of real rain scenes. We propose a self-supervised method for image deraining using monocular videos that leverages the fact that rain moves spatially across frames, independently of the static elements in a scene, thus enabling isolation of rain-affected regions. We utilize depth information from the target frame and the camera’s relative pose (translations and rotations) across frames to achieve scene alignment. We apply a view-synthesis constraint that warps features from adjacent frames to the target frame, which enables us to generate pseudo-ground truth images by selecting clean pixels from the warped frame. The pseudo-clean images thus generated are effectively leveraged by our network to remove rain from images in a self-supervised manner without the need for a real rain paired dataset which is difficult to capture. Extensive evaluations on diverse real-world rainy datasets demonstrate that our approach achieves state-of-the-art performance in real image deraining, outperforming existing unsupervised methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems