{"title":"S2DNet:一个使用单目视频的自监督训练网络","authors":"Aman Kumar, Aditya Mohan, A.N. Rajagopalan","doi":"10.1016/j.cviu.2025.104444","DOIUrl":null,"url":null,"abstract":"<div><div>Rainy conditions degrade the visual quality of images, thus presenting significant challenges for various vision-based downstream tasks. Traditional deraining approaches often rely on supervised learning methods requiring large, paired datasets of rainy and clean images. However, due to the dynamic and complex nature of rain, compiling such datasets is challenging and often insufficient for training robust models. As a result, researchers often resort to synthetic datasets. However, synthetic datasets have limitations because they often lack realism, can introduce biases, and seldom capture the diversity of real rain scenes. We propose a self-supervised method for image deraining using monocular videos that leverages the fact that rain moves spatially across frames, independently of the static elements in a scene, thus enabling isolation of rain-affected regions. We utilize depth information from the target frame and the camera’s relative pose (translations and rotations) across frames to achieve scene alignment. We apply a view-synthesis constraint that warps features from adjacent frames to the target frame, which enables us to generate pseudo-ground truth images by selecting clean pixels from the warped frame. The pseudo-clean images thus generated are effectively leveraged by our network to remove rain from images in a self-supervised manner without the need for a real rain paired dataset which is difficult to capture. Extensive evaluations on diverse real-world rainy datasets demonstrate that our approach achieves state-of-the-art performance in real image deraining, outperforming existing unsupervised methods.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104444"},"PeriodicalIF":4.3000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"S2DNet: A self-supervised deraining network using monocular videos\",\"authors\":\"Aman Kumar, Aditya Mohan, A.N. Rajagopalan\",\"doi\":\"10.1016/j.cviu.2025.104444\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Rainy conditions degrade the visual quality of images, thus presenting significant challenges for various vision-based downstream tasks. Traditional deraining approaches often rely on supervised learning methods requiring large, paired datasets of rainy and clean images. However, due to the dynamic and complex nature of rain, compiling such datasets is challenging and often insufficient for training robust models. As a result, researchers often resort to synthetic datasets. However, synthetic datasets have limitations because they often lack realism, can introduce biases, and seldom capture the diversity of real rain scenes. We propose a self-supervised method for image deraining using monocular videos that leverages the fact that rain moves spatially across frames, independently of the static elements in a scene, thus enabling isolation of rain-affected regions. We utilize depth information from the target frame and the camera’s relative pose (translations and rotations) across frames to achieve scene alignment. We apply a view-synthesis constraint that warps features from adjacent frames to the target frame, which enables us to generate pseudo-ground truth images by selecting clean pixels from the warped frame. The pseudo-clean images thus generated are effectively leveraged by our network to remove rain from images in a self-supervised manner without the need for a real rain paired dataset which is difficult to capture. Extensive evaluations on diverse real-world rainy datasets demonstrate that our approach achieves state-of-the-art performance in real image deraining, outperforming existing unsupervised methods.</div></div>\",\"PeriodicalId\":50633,\"journal\":{\"name\":\"Computer Vision and Image Understanding\",\"volume\":\"259 \",\"pages\":\"Article 104444\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision and Image Understanding\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1077314225001675\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225001675","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
S2DNet: A self-supervised deraining network using monocular videos
Rainy conditions degrade the visual quality of images, thus presenting significant challenges for various vision-based downstream tasks. Traditional deraining approaches often rely on supervised learning methods requiring large, paired datasets of rainy and clean images. However, due to the dynamic and complex nature of rain, compiling such datasets is challenging and often insufficient for training robust models. As a result, researchers often resort to synthetic datasets. However, synthetic datasets have limitations because they often lack realism, can introduce biases, and seldom capture the diversity of real rain scenes. We propose a self-supervised method for image deraining using monocular videos that leverages the fact that rain moves spatially across frames, independently of the static elements in a scene, thus enabling isolation of rain-affected regions. We utilize depth information from the target frame and the camera’s relative pose (translations and rotations) across frames to achieve scene alignment. We apply a view-synthesis constraint that warps features from adjacent frames to the target frame, which enables us to generate pseudo-ground truth images by selecting clean pixels from the warped frame. The pseudo-clean images thus generated are effectively leveraged by our network to remove rain from images in a self-supervised manner without the need for a real rain paired dataset which is difficult to capture. Extensive evaluations on diverse real-world rainy datasets demonstrate that our approach achieves state-of-the-art performance in real image deraining, outperforming existing unsupervised methods.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems