{"title":"SMDFusion: A Self-Supervised Fusion for Infrared and Visible Images via Cross-Modal Random Noise Masked Encoding and Difference Perception","authors":"Mingchuan Tan;Rencan Nie;Jinde Cao;Ying Zhang","doi":"10.1109/TCE.2025.3565680","DOIUrl":null,"url":null,"abstract":"Infrared and visible image fusion (IVIF) aims to merge images from both modalities of the same scene into a single image, enabling comprehensive information display and better support for visual computing tasks. Nevertheless, existing methods often overlook pixel-level relationships and struggle to effectively eliminate redundant information. To this end, we propose SMDFusion, a novel framework for fusing infrared and visible images using cross-modal noise-masked encoding and cross-modal differential perception information coupling. The framework consists of a self-supervised learning network (SSLN) and an unsupervised fusion network (UFN). Regarding the SSLN, the noise random masked encoder learns pixel-level relationships by employing a grid structure for multi-scale feature mapping that facilitates information exchange among different scales. The network is optimized with a self-supervision strategy for better representation learning. As for the UFN, symmetrical grid structures and multi-scale attention mechanisms are utilized to integrate intra-modal features while the cross-modal difference perception (CDP) module eliminates redundant information between modalities and conditionally captures complementary perception. The fusion image is synthesized by computing the modality-specific contribution estimation. Qualitative and quantitative experimental results demonstrate that SMDFusion outperforms representative methods in the task of multi-modal information fusion as well as supporting downstream tasks. The code is available at:<uri>https://github.com/rcnie/IVIF-SMDFusion</uri>.","PeriodicalId":13208,"journal":{"name":"IEEE Transactions on Consumer Electronics","volume":"71 2","pages":"2579-2591"},"PeriodicalIF":10.9000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Consumer Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10979991/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Infrared and visible image fusion (IVIF) aims to merge images from both modalities of the same scene into a single image, enabling comprehensive information display and better support for visual computing tasks. Nevertheless, existing methods often overlook pixel-level relationships and struggle to effectively eliminate redundant information. To this end, we propose SMDFusion, a novel framework for fusing infrared and visible images using cross-modal noise-masked encoding and cross-modal differential perception information coupling. The framework consists of a self-supervised learning network (SSLN) and an unsupervised fusion network (UFN). Regarding the SSLN, the noise random masked encoder learns pixel-level relationships by employing a grid structure for multi-scale feature mapping that facilitates information exchange among different scales. The network is optimized with a self-supervision strategy for better representation learning. As for the UFN, symmetrical grid structures and multi-scale attention mechanisms are utilized to integrate intra-modal features while the cross-modal difference perception (CDP) module eliminates redundant information between modalities and conditionally captures complementary perception. The fusion image is synthesized by computing the modality-specific contribution estimation. Qualitative and quantitative experimental results demonstrate that SMDFusion outperforms representative methods in the task of multi-modal information fusion as well as supporting downstream tasks. The code is available at:https://github.com/rcnie/IVIF-SMDFusion.
期刊介绍:
The main focus for the IEEE Transactions on Consumer Electronics is the engineering and research aspects of the theory, design, construction, manufacture or end use of mass market electronics, systems, software and services for consumers.