SMDFusion: A Self-Supervised Fusion for Infrared and Visible Images via Cross-Modal Random Noise Masked Encoding and Difference Perception

IF 10.9 2区 计算机科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Mingchuan Tan;Rencan Nie;Jinde Cao;Ying Zhang
{"title":"SMDFusion: A Self-Supervised Fusion for Infrared and Visible Images via Cross-Modal Random Noise Masked Encoding and Difference Perception","authors":"Mingchuan Tan;Rencan Nie;Jinde Cao;Ying Zhang","doi":"10.1109/TCE.2025.3565680","DOIUrl":null,"url":null,"abstract":"Infrared and visible image fusion (IVIF) aims to merge images from both modalities of the same scene into a single image, enabling comprehensive information display and better support for visual computing tasks. Nevertheless, existing methods often overlook pixel-level relationships and struggle to effectively eliminate redundant information. To this end, we propose SMDFusion, a novel framework for fusing infrared and visible images using cross-modal noise-masked encoding and cross-modal differential perception information coupling. The framework consists of a self-supervised learning network (SSLN) and an unsupervised fusion network (UFN). Regarding the SSLN, the noise random masked encoder learns pixel-level relationships by employing a grid structure for multi-scale feature mapping that facilitates information exchange among different scales. The network is optimized with a self-supervision strategy for better representation learning. As for the UFN, symmetrical grid structures and multi-scale attention mechanisms are utilized to integrate intra-modal features while the cross-modal difference perception (CDP) module eliminates redundant information between modalities and conditionally captures complementary perception. The fusion image is synthesized by computing the modality-specific contribution estimation. Qualitative and quantitative experimental results demonstrate that SMDFusion outperforms representative methods in the task of multi-modal information fusion as well as supporting downstream tasks. The code is available at:<uri>https://github.com/rcnie/IVIF-SMDFusion</uri>.","PeriodicalId":13208,"journal":{"name":"IEEE Transactions on Consumer Electronics","volume":"71 2","pages":"2579-2591"},"PeriodicalIF":10.9000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Consumer Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10979991/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Infrared and visible image fusion (IVIF) aims to merge images from both modalities of the same scene into a single image, enabling comprehensive information display and better support for visual computing tasks. Nevertheless, existing methods often overlook pixel-level relationships and struggle to effectively eliminate redundant information. To this end, we propose SMDFusion, a novel framework for fusing infrared and visible images using cross-modal noise-masked encoding and cross-modal differential perception information coupling. The framework consists of a self-supervised learning network (SSLN) and an unsupervised fusion network (UFN). Regarding the SSLN, the noise random masked encoder learns pixel-level relationships by employing a grid structure for multi-scale feature mapping that facilitates information exchange among different scales. The network is optimized with a self-supervision strategy for better representation learning. As for the UFN, symmetrical grid structures and multi-scale attention mechanisms are utilized to integrate intra-modal features while the cross-modal difference perception (CDP) module eliminates redundant information between modalities and conditionally captures complementary perception. The fusion image is synthesized by computing the modality-specific contribution estimation. Qualitative and quantitative experimental results demonstrate that SMDFusion outperforms representative methods in the task of multi-modal information fusion as well as supporting downstream tasks. The code is available at:https://github.com/rcnie/IVIF-SMDFusion.
SMDFusion:一种基于交叉模态随机噪声掩蔽编码和差异感知的自监督红外和可见光图像融合
红外和可见光图像融合(IVIF)旨在将同一场景的两种模式的图像合并为单个图像,从而实现全面的信息显示并更好地支持视觉计算任务。然而,现有的方法往往忽略了像素级的关系,难以有效地消除冗余信息。为此,我们提出了SMDFusion,这是一种利用跨模态噪声掩盖编码和跨模态差分感知信息耦合融合红外和可见光图像的新框架。该框架由自监督学习网络(SSLN)和无监督融合网络(UFN)组成。对于SSLN,噪声随机掩码编码器通过采用网格结构进行多尺度特征映射来学习像素级关系,从而促进不同尺度之间的信息交换。网络采用自我监督策略进行优化,以获得更好的表示学习。对于un,利用对称网格结构和多尺度注意机制来整合模态内特征,而跨模态差异感知(CDP)模块消除模态之间的冗余信息并有条件地捕获互补感知。通过计算模态贡献估计合成融合图像。定性和定量实验结果表明,SMDFusion在多模态信息融合任务中优于代表性方法,并支持下游任务。代码可从https://github.com/rcnie/IVIF-SMDFusion获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.70
自引率
9.30%
发文量
59
审稿时长
3.3 months
期刊介绍: The main focus for the IEEE Transactions on Consumer Electronics is the engineering and research aspects of the theory, design, construction, manufacture or end use of mass market electronics, systems, software and services for consumers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信