Triplane-Smoothed Video Dehazing with CLIP-Enhanced Generalization

IF 11.6 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Computer Vision Pub Date : 2024-08-01 DOI:10.1007/s11263-024-02161-0

Jingjing Ren, Haoyu Chen, Tian Ye, Hongtao Wu, Lei Zhu

{"title":"Triplane-Smoothed Video Dehazing with CLIP-Enhanced Generalization","authors":"Jingjing Ren, Haoyu Chen, Tian Ye, Hongtao Wu, Lei Zhu","doi":"10.1007/s11263-024-02161-0","DOIUrl":null,"url":null,"abstract":"<p>Video dehazing is a critical research area in computer vision that aims to enhance the quality of hazy frames, which benefits many downstream tasks, e.g. semantic segmentation. Recent work devise CNN-based structure or attention mechanism to fuse temporal information, while some others utilize offset between frames to align frames explicitly. Another significant line of video dehazing research focuses on constructing paired datasets by synthesizing foggy effect on clear video or generating real haze effect on indoor scenes. Despite the significant contributions of these dehazing networks and datasets to the advancement of video dehazing, current methods still suffer from spatial–temporal inconsistency and poor generalization ability. We address the aforementioned issues by proposing a triplane smoothing module to explicitly benefit from spatial–temporal smooth prior of the input video and generate temporally coherent dehazing results. We further devise a query base decoder to extract haze-relevant information while also aggregate temporal clues implicitly. To increase the generalization ability of our dehazing model we utilize CLIP guidance with a rich and high-level understanding of hazy effect. We conduct extensive experiments to verify the effectiveness of our model to generate spatial–temporally consistent dehazing results and produce pleasing dehazing results of real-world data.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"11 1","pages":""},"PeriodicalIF":11.6000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11263-024-02161-0","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Video dehazing is a critical research area in computer vision that aims to enhance the quality of hazy frames, which benefits many downstream tasks, e.g. semantic segmentation. Recent work devise CNN-based structure or attention mechanism to fuse temporal information, while some others utilize offset between frames to align frames explicitly. Another significant line of video dehazing research focuses on constructing paired datasets by synthesizing foggy effect on clear video or generating real haze effect on indoor scenes. Despite the significant contributions of these dehazing networks and datasets to the advancement of video dehazing, current methods still suffer from spatial–temporal inconsistency and poor generalization ability. We address the aforementioned issues by proposing a triplane smoothing module to explicitly benefit from spatial–temporal smooth prior of the input video and generate temporally coherent dehazing results. We further devise a query base decoder to extract haze-relevant information while also aggregate temporal clues implicitly. To increase the generalization ability of our dehazing model we utilize CLIP guidance with a rich and high-level understanding of hazy effect. We conduct extensive experiments to verify the effectiveness of our model to generate spatial–temporally consistent dehazing results and produce pleasing dehazing results of real-world data.

Abstract Image

查看原文本刊更多论文

采用 CLIP 增强泛化技术的三平面平滑视频去毛刺技术

视频去毛刺是计算机视觉的一个重要研究领域，其目的是提高模糊帧的质量，这对许多下游任务（如语义分割）都有好处。最近的研究设计了基于 CNN 的结构或注意力机制来融合时间信息，还有一些研究利用帧间偏移来明确对齐帧。视频去噪研究的另一个重要方向是通过在清晰视频中合成雾化效果或在室内场景中生成真实的雾霾效果来构建配对数据集。尽管这些去毛刺网络和数据集对视频去毛刺的发展做出了重大贡献，但目前的方法仍然存在时空不一致和泛化能力差的问题。针对上述问题，我们提出了一个三平面平滑模块，以明确受益于输入视频的时空平滑先验，并生成时空一致的去毛刺结果。我们还进一步设计了一个查询基础解码器，以提取与雾霾相关的信息，同时隐含地汇总时间线索。为了提高去雾模型的泛化能力，我们利用了对雾霾效果有丰富和高层次理解的 CLIP 指导。我们进行了大量实验，以验证我们的模型在生成空间-时间一致的去雾结果方面的有效性，并为真实世界的数据生成令人愉悦的去雾结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Computer Vision 工程技术-计算机：人工智能

CiteScore

29.80

自引率

2.10%

发文量

163

审稿时长

6 months

期刊介绍： The International Journal of Computer Vision (IJCV) serves as a platform for sharing new research findings in the rapidly growing field of computer vision. It publishes 12 issues annually and presents high-quality, original contributions to the science and engineering of computer vision. The journal encompasses various types of articles to cater to different research outputs. Regular articles, which span up to 25 journal pages, focus on significant technical advancements that are of broad interest to the field. These articles showcase substantial progress in computer vision. Short articles, limited to 10 pages, offer a swift publication path for novel research outcomes. They provide a quicker means for sharing new findings with the computer vision community. Survey articles, comprising up to 30 pages, offer critical evaluations of the current state of the art in computer vision or offer tutorial presentations of relevant topics. These articles provide comprehensive and insightful overviews of specific subject areas. In addition to technical articles, the journal also includes book reviews, position papers, and editorials by prominent scientific figures. These contributions serve to complement the technical content and provide valuable perspectives. The journal encourages authors to include supplementary material online, such as images, video sequences, data sets, and software. This additional material enhances the understanding and reproducibility of the published research. Overall, the International Journal of Computer Vision is a comprehensive publication that caters to researchers in this rapidly growing field. It covers a range of article types, offers additional online resources, and facilitates the dissemination of impactful research.