RSDC-Net: Robust self-supervised dynamic collaboration network for infrared and visible image fusion

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2025-09-28 DOI:10.1016/j.knosys.2025.114541

Yun Li , Ningyuan Zhao , Xue Yang , Liping Luo , Peiguang Jing

{"title":"RSDC-Net: Robust self-supervised dynamic collaboration network for infrared and visible image fusion","authors":"Yun Li , Ningyuan Zhao , Xue Yang , Liping Luo , Peiguang Jing","doi":"10.1016/j.knosys.2025.114541","DOIUrl":null,"url":null,"abstract":"<div><div>Infrared and visible image fusion (IVIF) aims to integrate complementary information from distinct sensors, yielding fused results that outperform the capabilities of either individual modality alone. Due to inherent modality bias, conventional fusion-reconstruction frameworks often struggle to effectively prioritize the representation of critical shared regions and diverse heterogeneous areas, while also showing deficiencies in shallow feature interactions. To address these challenges, we propose a robust self-supervised dynamic collaboration network (RSDC-Net), which adaptively and comprehensively selects complementary cues from both infrared and visible modalities. Specifically, we introduce a steady-state contrastive autoencoder that leverages a multi-task self-supervised strategy to enhance the robust representation of key shared cues in the mixed information flow. This strategy promotes deep cross-modal modeling of global dependencies across sources, thereby achieving semantic consistency. Furthermore, we design a latent inter-modal focus-guided module that integrates a bilateral transposed attention mechanism with a dynamic selection component to refine local-level heterogeneous cue allocation under the guidance of mutual global dependencies. Notably, a gated feed-forward unit is incorporated to filter outlier information flows across modalities. Quantitative results on the MSRS, TNO, and M3FD datasets demonstrate that RSDC-Net achieves the best performance on most of the eight evaluation metrics. Meanwhile, it also exhibits superior performance in qualitative visual assessments on these datasets as well as under challenging scenarios.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114541"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125015801","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Infrared and visible image fusion (IVIF) aims to integrate complementary information from distinct sensors, yielding fused results that outperform the capabilities of either individual modality alone. Due to inherent modality bias, conventional fusion-reconstruction frameworks often struggle to effectively prioritize the representation of critical shared regions and diverse heterogeneous areas, while also showing deficiencies in shallow feature interactions. To address these challenges, we propose a robust self-supervised dynamic collaboration network (RSDC-Net), which adaptively and comprehensively selects complementary cues from both infrared and visible modalities. Specifically, we introduce a steady-state contrastive autoencoder that leverages a multi-task self-supervised strategy to enhance the robust representation of key shared cues in the mixed information flow. This strategy promotes deep cross-modal modeling of global dependencies across sources, thereby achieving semantic consistency. Furthermore, we design a latent inter-modal focus-guided module that integrates a bilateral transposed attention mechanism with a dynamic selection component to refine local-level heterogeneous cue allocation under the guidance of mutual global dependencies. Notably, a gated feed-forward unit is incorporated to filter outlier information flows across modalities. Quantitative results on the MSRS, TNO, and M3FD datasets demonstrate that RSDC-Net achieves the best performance on most of the eight evaluation metrics. Meanwhile, it also exhibits superior performance in qualitative visual assessments on these datasets as well as under challenging scenarios.

查看原文本刊更多论文

RSDC-Net：用于红外和可见光图像融合的鲁棒自监督动态协作网络

红外和可见光图像融合（IVIF）旨在整合来自不同传感器的互补信息，产生的融合结果优于单独使用任何一种模式的能力。由于固有的模态偏差，传统的融合重建框架往往难以有效地优先表示关键共享区域和多样化异质区域，同时在浅层特征交互方面也存在不足。为了应对这些挑战，我们提出了一个强大的自监督动态协作网络（RSDC-Net），该网络自适应地全面选择来自红外和可见光模式的互补线索。具体来说，我们引入了一种稳态对比自编码器，该编码器利用多任务自监督策略来增强混合信息流中关键共享线索的鲁棒表示。该策略促进了跨源的全局依赖关系的深度跨模态建模，从而实现了语义一致性。此外，我们设计了一个潜在的多模态焦点引导模块，该模块将双边转置注意机制与动态选择组件集成在一起，以在相互全局依赖的指导下优化局部级异构线索分配。值得注意的是，一个门控前馈单元被纳入过滤跨模态的异常信息流。在MSRS， TNO和M3FD数据集上的定量结果表明，RSDC-Net在大多数八个评估指标上都达到了最佳性能。同时，它在这些数据集的定性视觉评估以及具有挑战性的场景下也表现出优异的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.