Yun Li , Ningyuan Zhao , Xue Yang , Liping Luo , Peiguang Jing
{"title":"RSDC-Net: Robust self-supervised dynamic collaboration network for infrared and visible image fusion","authors":"Yun Li , Ningyuan Zhao , Xue Yang , Liping Luo , Peiguang Jing","doi":"10.1016/j.knosys.2025.114541","DOIUrl":null,"url":null,"abstract":"<div><div>Infrared and visible image fusion (IVIF) aims to integrate complementary information from distinct sensors, yielding fused results that outperform the capabilities of either individual modality alone. Due to inherent modality bias, conventional fusion-reconstruction frameworks often struggle to effectively prioritize the representation of critical shared regions and diverse heterogeneous areas, while also showing deficiencies in shallow feature interactions. To address these challenges, we propose a robust self-supervised dynamic collaboration network (RSDC-Net), which adaptively and comprehensively selects complementary cues from both infrared and visible modalities. Specifically, we introduce a steady-state contrastive autoencoder that leverages a multi-task self-supervised strategy to enhance the robust representation of key shared cues in the mixed information flow. This strategy promotes deep cross-modal modeling of global dependencies across sources, thereby achieving semantic consistency. Furthermore, we design a latent inter-modal focus-guided module that integrates a bilateral transposed attention mechanism with a dynamic selection component to refine local-level heterogeneous cue allocation under the guidance of mutual global dependencies. Notably, a gated feed-forward unit is incorporated to filter outlier information flows across modalities. Quantitative results on the MSRS, TNO, and M3FD datasets demonstrate that RSDC-Net achieves the best performance on most of the eight evaluation metrics. Meanwhile, it also exhibits superior performance in qualitative visual assessments on these datasets as well as under challenging scenarios.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114541"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125015801","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Infrared and visible image fusion (IVIF) aims to integrate complementary information from distinct sensors, yielding fused results that outperform the capabilities of either individual modality alone. Due to inherent modality bias, conventional fusion-reconstruction frameworks often struggle to effectively prioritize the representation of critical shared regions and diverse heterogeneous areas, while also showing deficiencies in shallow feature interactions. To address these challenges, we propose a robust self-supervised dynamic collaboration network (RSDC-Net), which adaptively and comprehensively selects complementary cues from both infrared and visible modalities. Specifically, we introduce a steady-state contrastive autoencoder that leverages a multi-task self-supervised strategy to enhance the robust representation of key shared cues in the mixed information flow. This strategy promotes deep cross-modal modeling of global dependencies across sources, thereby achieving semantic consistency. Furthermore, we design a latent inter-modal focus-guided module that integrates a bilateral transposed attention mechanism with a dynamic selection component to refine local-level heterogeneous cue allocation under the guidance of mutual global dependencies. Notably, a gated feed-forward unit is incorporated to filter outlier information flows across modalities. Quantitative results on the MSRS, TNO, and M3FD datasets demonstrate that RSDC-Net achieves the best performance on most of the eight evaluation metrics. Meanwhile, it also exhibits superior performance in qualitative visual assessments on these datasets as well as under challenging scenarios.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.