CLIP-Based Natural Language-Guided Low-Redundancy Fusion of Infrared and Visible Images

IF 4.3 2区 计算机科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Jundong Zhang;Kangjian He;Dan Xu;Hongzhen Shi
{"title":"CLIP-Based Natural Language-Guided Low-Redundancy Fusion of Infrared and Visible Images","authors":"Jundong Zhang;Kangjian He;Dan Xu;Hongzhen Shi","doi":"10.1109/TCE.2025.3526792","DOIUrl":null,"url":null,"abstract":"The objective of infrared and visible image fusion is to produce a fused image that encompasses significant objects and intricate textures. However, existing methods frequently prioritize the extraction of complementary information, often overlooking the detrimental effects of redundant features. Moreover, due to the absence of authentic fused images, traditional mathematically defined loss functions face challenges in accurately modeling the characteristics of fused images. To address these challenges, this paper utilizes CLIP to design a natural language-guided, low-redundancy feature infrared and visible image fusion network. On one hand, we designed a Partial Feature Extraction(PFE) block and a Spatial-Channel Reconstruction Screening(SCRS) block to effectively reduce redundant features and enhance the focus on critical features. Additionally, we leveraged the CLIP model to bridge the gap between images and natural language, innovatively crafting a language-driven loss function to guide the fusion process through linguistic expressions. Extensive experiments conducted on multiple public datasets demonstrate that this method outperforms existing advanced techniques in both visual quality and quantitative assessment. Moreover, it achieves superior detection accuracy compared to current methods, reaching an advanced level of performance. The source code will be released at <uri>https://github.com/VCMHE/CNLFusion</uri>.","PeriodicalId":13208,"journal":{"name":"IEEE Transactions on Consumer Electronics","volume":"71 1","pages":"931-944"},"PeriodicalIF":4.3000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Consumer Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10829832/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

The objective of infrared and visible image fusion is to produce a fused image that encompasses significant objects and intricate textures. However, existing methods frequently prioritize the extraction of complementary information, often overlooking the detrimental effects of redundant features. Moreover, due to the absence of authentic fused images, traditional mathematically defined loss functions face challenges in accurately modeling the characteristics of fused images. To address these challenges, this paper utilizes CLIP to design a natural language-guided, low-redundancy feature infrared and visible image fusion network. On one hand, we designed a Partial Feature Extraction(PFE) block and a Spatial-Channel Reconstruction Screening(SCRS) block to effectively reduce redundant features and enhance the focus on critical features. Additionally, we leveraged the CLIP model to bridge the gap between images and natural language, innovatively crafting a language-driven loss function to guide the fusion process through linguistic expressions. Extensive experiments conducted on multiple public datasets demonstrate that this method outperforms existing advanced techniques in both visual quality and quantitative assessment. Moreover, it achieves superior detection accuracy compared to current methods, reaching an advanced level of performance. The source code will be released at https://github.com/VCMHE/CNLFusion.
基于自然语言引导的红外与可见光图像低冗余融合
红外和可见光图像融合的目的是产生包含重要目标和复杂纹理的融合图像。然而,现有的方法往往优先考虑互补信息的提取,往往忽略了冗余特征的有害影响。此外,由于缺乏真实的融合图像,传统的数学定义的损失函数在准确建模融合图像的特征方面面临挑战。为了解决这些问题,本文利用CLIP设计了一个自然语言引导、低冗余特征的红外和可见光图像融合网络。一方面,我们设计了部分特征提取(PFE)块和空间通道重构筛选(SCRS)块,有效地减少冗余特征,增强对关键特征的关注;此外,我们利用CLIP模型来弥合图像和自然语言之间的差距,创新地制作了一个语言驱动的损失函数,通过语言表达来指导融合过程。在多个公共数据集上进行的大量实验表明,该方法在视觉质量和定量评估方面都优于现有的先进技术。此外,与现有方法相比,它的检测精度更高,达到了先进的性能水平。源代码将在https://github.com/VCMHE/CNLFusion上发布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.70
自引率
9.30%
发文量
59
审稿时长
3.3 months
期刊介绍: The main focus for the IEEE Transactions on Consumer Electronics is the engineering and research aspects of the theory, design, construction, manufacture or end use of mass market electronics, systems, software and services for consumers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信