AFDFusion：用于多模态图像的自适应频率解耦融合网络

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2024-11-12 DOI:10.1016/j.eswa.2024.125694

Chengchao Wang , Zhengpeng Zhao , Qiuxia Yang , Rencan Nie , Jinde Cao , Yuanyuan Pu

{"title":"AFDFusion：用于多模态图像的自适应频率解耦融合网络","authors":"Chengchao Wang , Zhengpeng Zhao , Qiuxia Yang , Rencan Nie , Jinde Cao , Yuanyuan Pu","doi":"10.1016/j.eswa.2024.125694","DOIUrl":null,"url":null,"abstract":"<div><div>The multi-modality image fusion goal is to create a single image that provides a comprehensive scene description and conforms to visual perception by integrating complementary information about the merits of the different modalities, <em>e.g</em>., salient intensities of infrared images and detail textures of visible images. Although some works explore decoupled representations of multi-modality images, they struggle with complex nonlinear relationships, fine modal decoupling, and noise handling. To cope with this issue, we propose an adaptive frequency decoupling module to perceive the associative invariant and inherent specific among cross-modality by dynamically adjusting the learnable low frequency weight of the kernel. Specifically, we utilize a contrastive learning loss for restricting the solution space of feature decoupling to learn representations of both the invariant and specific in the multi-modality images. The underlying idea is that: in decoupling, low frequency features, which are similar in the representation space, should be pulled closer to each other, signifying the associative invariant, while high frequencies are pushed farther away, also indicating the intrinsic specific. Additionally, a multi-stage training manner is introduced into our framework to achieve decoupling and fusion. Stage I, <em>MixEncoder</em> and <em>MixDecoder</em> with the same architecture but different parameters are trained to perform decoupling and reconstruction supervised by the contrastive self-supervised mechanism. Stage II, two feature fusion modules are added to integrate the invariant and specific features and output the fused image. Extensive experiments demonstrated the proposed method superiority over the state-of-the-art methods in both qualitative and quantitative evaluation on two multi-modal image fusion tasks.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"263 ","pages":"Article 125694"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AFDFusion: An adaptive frequency decoupling fusion network for multi-modality image\",\"authors\":\"Chengchao Wang , Zhengpeng Zhao , Qiuxia Yang , Rencan Nie , Jinde Cao , Yuanyuan Pu\",\"doi\":\"10.1016/j.eswa.2024.125694\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The multi-modality image fusion goal is to create a single image that provides a comprehensive scene description and conforms to visual perception by integrating complementary information about the merits of the different modalities, <em>e.g</em>., salient intensities of infrared images and detail textures of visible images. Although some works explore decoupled representations of multi-modality images, they struggle with complex nonlinear relationships, fine modal decoupling, and noise handling. To cope with this issue, we propose an adaptive frequency decoupling module to perceive the associative invariant and inherent specific among cross-modality by dynamically adjusting the learnable low frequency weight of the kernel. Specifically, we utilize a contrastive learning loss for restricting the solution space of feature decoupling to learn representations of both the invariant and specific in the multi-modality images. The underlying idea is that: in decoupling, low frequency features, which are similar in the representation space, should be pulled closer to each other, signifying the associative invariant, while high frequencies are pushed farther away, also indicating the intrinsic specific. Additionally, a multi-stage training manner is introduced into our framework to achieve decoupling and fusion. Stage I, <em>MixEncoder</em> and <em>MixDecoder</em> with the same architecture but different parameters are trained to perform decoupling and reconstruction supervised by the contrastive self-supervised mechanism. Stage II, two feature fusion modules are added to integrate the invariant and specific features and output the fused image. Extensive experiments demonstrated the proposed method superiority over the state-of-the-art methods in both qualitative and quantitative evaluation on two multi-modal image fusion tasks.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"263 \",\"pages\":\"Article 125694\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417424025612\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424025612","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

多模态图像融合的目标是通过整合不同模态的互补信息（如红外图像的突出强度和可见光图像的细节纹理），创建一幅能够提供全面场景描述并符合视觉感知的图像。虽然有些研究探索了多模态图像的解耦表征，但它们在复杂的非线性关系、精细的模态解耦和噪声处理等方面都存在困难。为了解决这个问题，我们提出了一种自适应频率解耦模块，通过动态调整核的可学习低频权重，来感知跨模态之间的关联不变性和固有特异性。具体来说，我们利用对比学习损失来限制特征解耦的解空间，以学习多模态图像中的不变性和特异性表征。其基本思想是：在解耦过程中，在表征空间中相似的低频特征应被拉近，表示关联不变性，而高频特征则被推远，也表示内在特异性。此外，我们的框架还引入了多阶段训练方式，以实现解耦和融合。在第一阶段，对具有相同架构但不同参数的混合编码器（MixEncoder）和混合解码器（MixDecoder）进行训练，在对比度自监督机制的监督下进行解耦和重构。第二阶段，添加两个特征融合模块，以整合不变特征和特定特征，并输出融合图像。广泛的实验证明，在两个多模态图像融合任务的定性和定量评估中，所提出的方法都优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

AFDFusion: An adaptive frequency decoupling fusion network for multi-modality image

The multi-modality image fusion goal is to create a single image that provides a comprehensive scene description and conforms to visual perception by integrating complementary information about the merits of the different modalities, e.g., salient intensities of infrared images and detail textures of visible images. Although some works explore decoupled representations of multi-modality images, they struggle with complex nonlinear relationships, fine modal decoupling, and noise handling. To cope with this issue, we propose an adaptive frequency decoupling module to perceive the associative invariant and inherent specific among cross-modality by dynamically adjusting the learnable low frequency weight of the kernel. Specifically, we utilize a contrastive learning loss for restricting the solution space of feature decoupling to learn representations of both the invariant and specific in the multi-modality images. The underlying idea is that: in decoupling, low frequency features, which are similar in the representation space, should be pulled closer to each other, signifying the associative invariant, while high frequencies are pushed farther away, also indicating the intrinsic specific. Additionally, a multi-stage training manner is introduced into our framework to achieve decoupling and fusion. Stage I, MixEncoder and MixDecoder with the same architecture but different parameters are trained to perform decoupling and reconstruction supervised by the contrastive self-supervised mechanism. Stage II, two feature fusion modules are added to integrate the invariant and specific features and output the fused image. Extensive experiments demonstrated the proposed method superiority over the state-of-the-art methods in both qualitative and quantitative evaluation on two multi-modal image fusion tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.