Boosting Restoration of Turbulence-Degraded Images With State Space Conditional Diffusion

IF 4.6 2区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Signal Processing Pub Date : 2025-06-23 DOI:10.1109/TSP.2025.3580723

Yubo Wu;Kuanhong Cheng;Tingting Chai;Gengyu Lyu;Shuping Zhao;Wei Jia

{"title":"Boosting Restoration of Turbulence-Degraded Images With State Space Conditional Diffusion","authors":"Yubo Wu;Kuanhong Cheng;Tingting Chai;Gengyu Lyu;Shuping Zhao;Wei Jia","doi":"10.1109/TSP.2025.3580723","DOIUrl":null,"url":null,"abstract":"Recovering fine details from turbulence-distorted images is highly challenging due to the complex, spatially varying, and stochastic nature of the distortion process. Conventional multi-frame methods rely on extracting and averaging clear regions from pre-aligned frames, but their effectiveness is limited due to the rarity of “lucky regions”. In contrast, learning based methods have shown superior performance across various vision tasks. However, existing deep learning approaches still face key limitations: (1) they struggle to efficiently model the global context required for correcting pixel dispersion caused by spatially varying Point Spread Functions (PSFs); (2) they often overlook the physical formation of turbulence, particularly the spatial-frequency relationship between phase distortions and PSFs; and (3) they rely on deterministic architectures that fail to capture the inherent uncertainty in turbulence, leading to visually implausible outputs. To address these issues, we propose the Two-Stage Turbulence Removal Network (TSTRNet). The first stage uses a UNet-based generator built on the State Space Model to perform efficient, coarse global restoration. The second stage refines the output through a Denoising Diffusion Probabilistic Model, introducing stochasticity and edge-guided conditioning for detail enhancement and realism. Both stages incorporate frequency-domain processing to align with the physical characteristics of turbulence. Experimental results on multiple benchmark datasets demonstrate that TSTRNet achieves superior restoration performance compared to state-of-the-art methods, with strong generalization from synthetic to real-world scenarios.","PeriodicalId":13330,"journal":{"name":"IEEE Transactions on Signal Processing","volume":"73 ","pages":"2631-2645"},"PeriodicalIF":4.6000,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11046205/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Recovering fine details from turbulence-distorted images is highly challenging due to the complex, spatially varying, and stochastic nature of the distortion process. Conventional multi-frame methods rely on extracting and averaging clear regions from pre-aligned frames, but their effectiveness is limited due to the rarity of “lucky regions”. In contrast, learning based methods have shown superior performance across various vision tasks. However, existing deep learning approaches still face key limitations: (1) they struggle to efficiently model the global context required for correcting pixel dispersion caused by spatially varying Point Spread Functions (PSFs); (2) they often overlook the physical formation of turbulence, particularly the spatial-frequency relationship between phase distortions and PSFs; and (3) they rely on deterministic architectures that fail to capture the inherent uncertainty in turbulence, leading to visually implausible outputs. To address these issues, we propose the Two-Stage Turbulence Removal Network (TSTRNet). The first stage uses a UNet-based generator built on the State Space Model to perform efficient, coarse global restoration. The second stage refines the output through a Denoising Diffusion Probabilistic Model, introducing stochasticity and edge-guided conditioning for detail enhancement and realism. Both stages incorporate frequency-domain processing to align with the physical characteristics of turbulence. Experimental results on multiple benchmark datasets demonstrate that TSTRNet achieves superior restoration performance compared to state-of-the-art methods, with strong generalization from synthetic to real-world scenarios.

查看原文本刊更多论文

状态空间条件扩散增强湍流退化图像的恢复

由于扭曲过程的复杂性、空间变化性和随机性，从湍流扭曲图像中恢复精细细节是非常有挑战性的。传统的多帧方法依赖于从预对齐帧中提取和平均清晰区域，但由于“幸运区域”的稀缺性，其有效性受到限制。相比之下，基于学习的方法在各种视觉任务中表现出优越的性能。然而，现有的深度学习方法仍然面临着关键的局限性：(1)它们难以有效地模拟校正由空间变化的点扩散函数（psf）引起的像素分散所需的全局环境；(2)它们往往忽略了湍流的物理形成，特别是相位畸变与psf之间的空间-频率关系；(3)它们依赖于确定性架构，无法捕捉湍流中固有的不确定性，导致视觉上难以置信的输出。为了解决这些问题，我们提出了两阶段湍流去除网络（TSTRNet）。第一阶段使用建立在状态空间模型上的基于unet的生成器来执行有效的粗全局恢复。第二阶段通过去噪扩散概率模型细化输出，引入随机性和边缘引导条件，以增强细节和真实感。这两个阶段都结合了频域处理，以配合湍流的物理特性。在多个基准数据集上的实验结果表明，与最先进的方法相比，TSTRNet具有更好的恢复性能，具有从合成到真实场景的强泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Signal Processing 工程技术-工程：电子与电气

CiteScore

11.20

自引率

9.30%

发文量

310

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Signal Processing covers novel theory, algorithms, performance analyses and applications of techniques for the processing, understanding, learning, retrieval, mining, and extraction of information from signals. The term “signal” includes, among others, audio, video, speech, image, communication, geophysical, sonar, radar, medical and musical signals. Examples of topics of interest include, but are not limited to, information processing and the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals.