FS-Diff: Semantic guidance and clarity-aware simultaneous multimodal image fusion and super-resolution

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2025-04-02 DOI:10.1016/j.inffus.2025.103146

Yuchan Jie , Yushen Xu , Xiaosong Li , Fuqiang Zhou , Jianming Lv , Huafeng Li

{"title":"FS-Diff: Semantic guidance and clarity-aware simultaneous multimodal image fusion and super-resolution","authors":"Yuchan Jie , Yushen Xu , Xiaosong Li , Fuqiang Zhou , Jianming Lv , Huafeng Li","doi":"10.1016/j.inffus.2025.103146","DOIUrl":null,"url":null,"abstract":"<div><div>As an influential information fusion and low-level vision technique, image fusion integrates complementary information from source images to yield an informative fused image. A few attempts have been made in recent years to jointly realize image fusion and super-resolution. However, in real-world applications such as military reconnaissance and long-range detection missions, the target and background structures in multimodal images are easily corrupted, with low resolution and weak semantic information, which leads to suboptimal results in current fusion techniques. In response, we propose FS-Diff, a semantic guidance and clarity-aware joint image fusion and super-resolution method. FS-Diff unifies image fusion and super-resolution as a conditional generation problem. It leverages semantic guidance from the proposed clarity sensing mechanism for adaptive low-resolution perception and cross-modal feature extraction. Specifically, we initialize the desired fused result as pure Gaussian noise and introduce the bidirectional feature Mamba to extract the global features of the multimodal images. Moreover, utilizing the source images and semantics as conditions, we implement a random iterative denoising process via a modified U-Net network. This network istrained for denoising at multiple noise levels to produce high-resolution fusion results with cross-modal features and abundant semantic information. We also construct a powerful aerial view multiscene (AVMS) benchmark covering 600 pairs of images. Extensive joint image fusion and super-resolution experiments on six public and our AVMS datasets demonstrated that FS-Diff outperforms the state-of-the-art methods at multiple magnifications and can recover richer details and semantics in the fused images. The code is available at <span><span>https://github.com/XylonXu01/FS-Diff</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"121 ","pages":"Article 103146"},"PeriodicalIF":14.7000,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525002192","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

As an influential information fusion and low-level vision technique, image fusion integrates complementary information from source images to yield an informative fused image. A few attempts have been made in recent years to jointly realize image fusion and super-resolution. However, in real-world applications such as military reconnaissance and long-range detection missions, the target and background structures in multimodal images are easily corrupted, with low resolution and weak semantic information, which leads to suboptimal results in current fusion techniques. In response, we propose FS-Diff, a semantic guidance and clarity-aware joint image fusion and super-resolution method. FS-Diff unifies image fusion and super-resolution as a conditional generation problem. It leverages semantic guidance from the proposed clarity sensing mechanism for adaptive low-resolution perception and cross-modal feature extraction. Specifically, we initialize the desired fused result as pure Gaussian noise and introduce the bidirectional feature Mamba to extract the global features of the multimodal images. Moreover, utilizing the source images and semantics as conditions, we implement a random iterative denoising process via a modified U-Net network. This network istrained for denoising at multiple noise levels to produce high-resolution fusion results with cross-modal features and abundant semantic information. We also construct a powerful aerial view multiscene (AVMS) benchmark covering 600 pairs of images. Extensive joint image fusion and super-resolution experiments on six public and our AVMS datasets demonstrated that FS-Diff outperforms the state-of-the-art methods at multiple magnifications and can recover richer details and semantics in the fused images. The code is available at https://github.com/XylonXu01/FS-Diff.

查看原文本刊更多论文

FS-Diff：语义引导和清晰度感知同步多模态图像融合与超分辨率

图像融合是一种很有影响力的信息融合技术，是一种低级视觉技术。近年来，人们进行了一些尝试，将图像融合与超分辨率结合起来实现。然而，在实际应用中，如军事侦察和远程探测任务中，多模态图像中的目标和背景结构容易被破坏，分辨率低，语义信息弱，导致当前的融合技术效果不理想。为此，我们提出了一种语义引导和清晰度感知的联合图像融合和超分辨率方法FS-Diff。FS-Diff将图像融合和超分辨率统一为条件生成问题。它利用所提出的清晰度感知机制的语义指导进行自适应低分辨率感知和跨模态特征提取。具体而言，我们将期望的融合结果初始化为纯高斯噪声，并引入双向特征曼巴来提取多模态图像的全局特征。此外，我们利用源图像和语义作为条件，通过改进的U-Net网络实现随机迭代去噪过程。该网络可以在多个噪声水平下进行去噪，从而产生具有跨模态特征和丰富语义信息的高分辨率融合结果。我们还构建了一个覆盖600对图像的强大的鸟瞰图多场景（AVMS）基准。在六个公共和我们的AVMS数据集上进行的广泛的联合图像融合和超分辨率实验表明，FS-Diff在多个放大倍率下优于最先进的方法，并且可以在融合的图像中恢复更丰富的细节和语义。代码可在https://github.com/XylonXu01/FS-Diff上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.