{"title":"Enhancing outdoor vision: Binocular desnowing with dual-stream temporal transformer","authors":"En Yu, Jie Lu, Kaihao Zhang, Guangquan Zhang","doi":"10.1016/j.patcog.2025.112075","DOIUrl":null,"url":null,"abstract":"<div><div>Video desnowing, aimed at removing snowflakes and enhancing the quality of videos, is a crucial yet intricate task essential for improving the effectiveness of outdoor vision systems. Compared to rain and haze, the inherent opacity and diverse morphology of snowflakes result in more pronounced background occlusions, thereby challenging the efficacy of current desnowing techniques, particularly those focusing solely on images or videos captured from a monocular perspective. To address these challenges, this paper proposes a Dual-Stream Temporal Transformer (DSTT) to advance snow removal and visual enhancement by leveraging comprehensive information from stereo views and spatial-temporal cues. More specifically, it incorporates a Dual-Stream Weight-shared Transformer (DSWT) module to exploit spatial information from different views. This module employs a hierarchical weight-sharing strategy to extract fused spatial features across different views from low-level to high-level layers. Subsequently, the Dual-Stream ConvLSTM (DS-CLSTM) module is introduced to capture temporal correlations across streaming frames. By combining temporal-spatial cues and complementary details from diverse views, videos can be effectively restored while preserving the original content’s details. In addition, two binocular snowy datasets – SnowKITTI2012 and SnowKITTI 2015 – are presented, providing a valuable resource for evaluating the binocular desnowing task. Comprehensive experiments evaluated on both synthetic and real-world snowy datasets demonstrate that our proposed method outperforms the state-of-the-art baselines.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112075"},"PeriodicalIF":7.5000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325007356","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Video desnowing, aimed at removing snowflakes and enhancing the quality of videos, is a crucial yet intricate task essential for improving the effectiveness of outdoor vision systems. Compared to rain and haze, the inherent opacity and diverse morphology of snowflakes result in more pronounced background occlusions, thereby challenging the efficacy of current desnowing techniques, particularly those focusing solely on images or videos captured from a monocular perspective. To address these challenges, this paper proposes a Dual-Stream Temporal Transformer (DSTT) to advance snow removal and visual enhancement by leveraging comprehensive information from stereo views and spatial-temporal cues. More specifically, it incorporates a Dual-Stream Weight-shared Transformer (DSWT) module to exploit spatial information from different views. This module employs a hierarchical weight-sharing strategy to extract fused spatial features across different views from low-level to high-level layers. Subsequently, the Dual-Stream ConvLSTM (DS-CLSTM) module is introduced to capture temporal correlations across streaming frames. By combining temporal-spatial cues and complementary details from diverse views, videos can be effectively restored while preserving the original content’s details. In addition, two binocular snowy datasets – SnowKITTI2012 and SnowKITTI 2015 – are presented, providing a valuable resource for evaluating the binocular desnowing task. Comprehensive experiments evaluated on both synthetic and real-world snowy datasets demonstrate that our proposed method outperforms the state-of-the-art baselines.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.