KDFuse: A high-level vision task-driven infrared and visible image fusion method based on cross-domain knowledge distillation

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2025-01-14 DOI:10.1016/j.inffus.2025.102944

Chenjia Yang, Xiaoqing Luo, Zhancheng Zhang, Zhiguo Chen, Xiao-jun Wu

{"title":"KDFuse: A high-level vision task-driven infrared and visible image fusion method based on cross-domain knowledge distillation","authors":"Chenjia Yang, Xiaoqing Luo, Zhancheng Zhang, Zhiguo Chen, Xiao-jun Wu","doi":"10.1016/j.inffus.2025.102944","DOIUrl":null,"url":null,"abstract":"To enhance the comprehensiveness of fusion features and meet the requirements of high-level vision tasks, some fusion methods attempt to coordinate the fusion process by directly interacting with the high-level semantic feature. However, due to the significant disparity between high-level semantic domain and fusion representation domain, there is potential for enhancing the effectiveness of the collaborative approach to direct interaction. To overcome this obstacle, a high-level vision task-driven infrared and visible image fusion method based on cross-domain knowledge distillation is proposed, referred to as KDFuse. The KDFuse brings multi-task perceptual representation into the same domain through cross-domain knowledge distillation. By facilitating interaction between semantic information and fusion information at an equivalent level, it effectively reduces the gap between the semantic and fusion domains, enabling multi-task collaborative fusion. Specifically, to acquire superior high-level semantic representations essential for instructing the fusion network, the teaching relationship is established to realize multi-task collaboration by the multi-domain interaction distillation module (MIDM). The multi-scale semantic perception module (MSPM) is designed to learn the ability to capture semantic information through the cross-domain knowledge distillation and the semantic detail integration module (SDIM) is constructed to integrate the fusion-level semantic representations with the fusion-level visual representations. Moreover, to balance the semantic and visual representations during the fusion process, the Fourier transform is introduced into the loss function. Extensive comprehensive experiments demonstrate the effectiveness of the proposed method in both image fusion and downstream tasks. The source code is available at <ce:inter-ref xlink:href=\"https://github.com/lxq-jnu/KDFuse\" xlink:type=\"simple\">https://github.com/lxq-jnu/KDFuse</ce:inter-ref>.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"31 1","pages":""},"PeriodicalIF":14.7000,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1016/j.inffus.2025.102944","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

To enhance the comprehensiveness of fusion features and meet the requirements of high-level vision tasks, some fusion methods attempt to coordinate the fusion process by directly interacting with the high-level semantic feature. However, due to the significant disparity between high-level semantic domain and fusion representation domain, there is potential for enhancing the effectiveness of the collaborative approach to direct interaction. To overcome this obstacle, a high-level vision task-driven infrared and visible image fusion method based on cross-domain knowledge distillation is proposed, referred to as KDFuse. The KDFuse brings multi-task perceptual representation into the same domain through cross-domain knowledge distillation. By facilitating interaction between semantic information and fusion information at an equivalent level, it effectively reduces the gap between the semantic and fusion domains, enabling multi-task collaborative fusion. Specifically, to acquire superior high-level semantic representations essential for instructing the fusion network, the teaching relationship is established to realize multi-task collaboration by the multi-domain interaction distillation module (MIDM). The multi-scale semantic perception module (MSPM) is designed to learn the ability to capture semantic information through the cross-domain knowledge distillation and the semantic detail integration module (SDIM) is constructed to integrate the fusion-level semantic representations with the fusion-level visual representations. Moreover, to balance the semantic and visual representations during the fusion process, the Fourier transform is introduced into the loss function. Extensive comprehensive experiments demonstrate the effectiveness of the proposed method in both image fusion and downstream tasks. The source code is available at https://github.com/lxq-jnu/KDFuse.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.