Decoupling Image Deblurring Into Twofold: A Hierarchical Model for Defocus Deblurring

IF 4.2 2区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Computational Imaging Pub Date : 2024-08-15 DOI:10.1109/TCI.2024.3443732

Pengwei Liang;Junjun Jiang;Xianming Liu;Jiayi Ma

{"title":"Decoupling Image Deblurring Into Twofold: A Hierarchical Model for Defocus Deblurring","authors":"Pengwei Liang;Junjun Jiang;Xianming Liu;Jiayi Ma","doi":"10.1109/TCI.2024.3443732","DOIUrl":null,"url":null,"abstract":"Defocus deblurring, especially when facing spatially varying blur due to scene depth, remains a challenging problem. While recent advancements in network architectures have predominantly addressed high-frequency details, the importance of scene understanding for deblurring remains paramount. A crucial aspect of this understanding is \n<italic>contextual information</i>\n, which captures vital high-level semantic cues essential for grasping the context and object outlines. Recognizing and effectively capitalizing on these cues can lead to substantial improvements in image recovery. With this foundation, we propose a novel method that integrates spatial details and contextual information, offering significant advancements in defocus deblurring. Consequently, we introduce a novel hierarchical model, built upon the capabilities of the Vision Transformer (ViT). This model seamlessly encodes both spatial details and contextual information, yielding a robust solution. In particular, our approach decouples the complex deblurring task into two distinct subtasks. The first is handled by a primary feature encoder that transforms blurred images into detailed representations. The second involves a contextual encoder that produces abstract and sharp representations from the primary ones. The combined outputs from these encoders are then merged by a decoder to reproduce the sharp target image. Our evaluation across multiple defocus deblurring datasets demonstrates that the proposed method achieves compelling performance.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"10 ","pages":"1207-1220"},"PeriodicalIF":4.2000,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Imaging","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10637737/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Defocus deblurring, especially when facing spatially varying blur due to scene depth, remains a challenging problem. While recent advancements in network architectures have predominantly addressed high-frequency details, the importance of scene understanding for deblurring remains paramount. A crucial aspect of this understanding is contextual information , which captures vital high-level semantic cues essential for grasping the context and object outlines. Recognizing and effectively capitalizing on these cues can lead to substantial improvements in image recovery. With this foundation, we propose a novel method that integrates spatial details and contextual information, offering significant advancements in defocus deblurring. Consequently, we introduce a novel hierarchical model, built upon the capabilities of the Vision Transformer (ViT). This model seamlessly encodes both spatial details and contextual information, yielding a robust solution. In particular, our approach decouples the complex deblurring task into two distinct subtasks. The first is handled by a primary feature encoder that transforms blurred images into detailed representations. The second involves a contextual encoder that produces abstract and sharp representations from the primary ones. The combined outputs from these encoders are then merged by a decoder to reproduce the sharp target image. Our evaluation across multiple defocus deblurring datasets demonstrates that the proposed method achieves compelling performance.

查看原文本刊更多论文

将图像去毛刺解耦为两部分：去焦点模糊的层次模型

散焦去模糊，尤其是面对场景深度造成的空间变化模糊时，仍然是一个具有挑战性的问题。虽然网络架构的最新进展主要是解决高频细节问题，但场景理解对于去模糊的重要性仍然至关重要。这种理解的一个重要方面是上下文信息，它捕捉了对把握上下文和物体轮廓至关重要的高级语义线索。识别并有效利用这些线索可以大大提高图像复原的效率。在此基础上，我们提出了一种整合空间细节和上下文信息的新方法，从而在去焦点模糊方面取得了重大进展。因此，我们在视觉转换器（ViT）的功能基础上引入了一种新的分层模型。该模型能无缝地编码空间细节和上下文信息，从而产生一个强大的解决方案。特别是，我们的方法将复杂的去模糊任务分解为两个不同的子任务。第一项任务由主要特征编码器处理，该编码器将模糊图像转换为细节表示。第二个子任务涉及一个上下文编码器，该编码器根据主要特征生成抽象而清晰的表征。然后，解码器将这些编码器的输出合并，再现清晰的目标图像。我们在多个去焦模糊数据集上进行的评估表明，所提出的方法取得了令人信服的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Computational Imaging Mathematics-Computational Mathematics

CiteScore

8.20

自引率

7.40%

发文量

期刊介绍： The IEEE Transactions on Computational Imaging will publish articles where computation plays an integral role in the image formation process. Papers will cover all areas of computational imaging ranging from fundamental theoretical methods to the latest innovative computational imaging system designs. Topics of interest will include advanced algorithms and mathematical techniques, model-based data inversion, methods for image and signal recovery from sparse and incomplete data, techniques for non-traditional sensing of image data, methods for dynamic information acquisition and extraction from imaging sensors, software and hardware for efficient computation in imaging systems, and highly novel imaging system design.