Chengchao Wang , Zhengpeng Zhao , Qiuxia Yang , Rencan Nie , Jinde Cao , Yuanyuan Pu
{"title":"AFDFusion: An adaptive frequency decoupling fusion network for multi-modality image","authors":"Chengchao Wang , Zhengpeng Zhao , Qiuxia Yang , Rencan Nie , Jinde Cao , Yuanyuan Pu","doi":"10.1016/j.eswa.2024.125694","DOIUrl":null,"url":null,"abstract":"<div><div>The multi-modality image fusion goal is to create a single image that provides a comprehensive scene description and conforms to visual perception by integrating complementary information about the merits of the different modalities, <em>e.g</em>., salient intensities of infrared images and detail textures of visible images. Although some works explore decoupled representations of multi-modality images, they struggle with complex nonlinear relationships, fine modal decoupling, and noise handling. To cope with this issue, we propose an adaptive frequency decoupling module to perceive the associative invariant and inherent specific among cross-modality by dynamically adjusting the learnable low frequency weight of the kernel. Specifically, we utilize a contrastive learning loss for restricting the solution space of feature decoupling to learn representations of both the invariant and specific in the multi-modality images. The underlying idea is that: in decoupling, low frequency features, which are similar in the representation space, should be pulled closer to each other, signifying the associative invariant, while high frequencies are pushed farther away, also indicating the intrinsic specific. Additionally, a multi-stage training manner is introduced into our framework to achieve decoupling and fusion. Stage I, <em>MixEncoder</em> and <em>MixDecoder</em> with the same architecture but different parameters are trained to perform decoupling and reconstruction supervised by the contrastive self-supervised mechanism. Stage II, two feature fusion modules are added to integrate the invariant and specific features and output the fused image. Extensive experiments demonstrated the proposed method superiority over the state-of-the-art methods in both qualitative and quantitative evaluation on two multi-modal image fusion tasks.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"263 ","pages":"Article 125694"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424025612","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The multi-modality image fusion goal is to create a single image that provides a comprehensive scene description and conforms to visual perception by integrating complementary information about the merits of the different modalities, e.g., salient intensities of infrared images and detail textures of visible images. Although some works explore decoupled representations of multi-modality images, they struggle with complex nonlinear relationships, fine modal decoupling, and noise handling. To cope with this issue, we propose an adaptive frequency decoupling module to perceive the associative invariant and inherent specific among cross-modality by dynamically adjusting the learnable low frequency weight of the kernel. Specifically, we utilize a contrastive learning loss for restricting the solution space of feature decoupling to learn representations of both the invariant and specific in the multi-modality images. The underlying idea is that: in decoupling, low frequency features, which are similar in the representation space, should be pulled closer to each other, signifying the associative invariant, while high frequencies are pushed farther away, also indicating the intrinsic specific. Additionally, a multi-stage training manner is introduced into our framework to achieve decoupling and fusion. Stage I, MixEncoder and MixDecoder with the same architecture but different parameters are trained to perform decoupling and reconstruction supervised by the contrastive self-supervised mechanism. Stage II, two feature fusion modules are added to integrate the invariant and specific features and output the fused image. Extensive experiments demonstrated the proposed method superiority over the state-of-the-art methods in both qualitative and quantitative evaluation on two multi-modal image fusion tasks.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.