{"title":"CLIP-Based Natural Language-Guided Low-Redundancy Fusion of Infrared and Visible Images","authors":"Jundong Zhang;Kangjian He;Dan Xu;Hongzhen Shi","doi":"10.1109/TCE.2025.3526792","DOIUrl":null,"url":null,"abstract":"The objective of infrared and visible image fusion is to produce a fused image that encompasses significant objects and intricate textures. However, existing methods frequently prioritize the extraction of complementary information, often overlooking the detrimental effects of redundant features. Moreover, due to the absence of authentic fused images, traditional mathematically defined loss functions face challenges in accurately modeling the characteristics of fused images. To address these challenges, this paper utilizes CLIP to design a natural language-guided, low-redundancy feature infrared and visible image fusion network. On one hand, we designed a Partial Feature Extraction(PFE) block and a Spatial-Channel Reconstruction Screening(SCRS) block to effectively reduce redundant features and enhance the focus on critical features. Additionally, we leveraged the CLIP model to bridge the gap between images and natural language, innovatively crafting a language-driven loss function to guide the fusion process through linguistic expressions. Extensive experiments conducted on multiple public datasets demonstrate that this method outperforms existing advanced techniques in both visual quality and quantitative assessment. Moreover, it achieves superior detection accuracy compared to current methods, reaching an advanced level of performance. The source code will be released at <uri>https://github.com/VCMHE/CNLFusion</uri>.","PeriodicalId":13208,"journal":{"name":"IEEE Transactions on Consumer Electronics","volume":"71 1","pages":"931-944"},"PeriodicalIF":4.3000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Consumer Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10829832/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
The objective of infrared and visible image fusion is to produce a fused image that encompasses significant objects and intricate textures. However, existing methods frequently prioritize the extraction of complementary information, often overlooking the detrimental effects of redundant features. Moreover, due to the absence of authentic fused images, traditional mathematically defined loss functions face challenges in accurately modeling the characteristics of fused images. To address these challenges, this paper utilizes CLIP to design a natural language-guided, low-redundancy feature infrared and visible image fusion network. On one hand, we designed a Partial Feature Extraction(PFE) block and a Spatial-Channel Reconstruction Screening(SCRS) block to effectively reduce redundant features and enhance the focus on critical features. Additionally, we leveraged the CLIP model to bridge the gap between images and natural language, innovatively crafting a language-driven loss function to guide the fusion process through linguistic expressions. Extensive experiments conducted on multiple public datasets demonstrate that this method outperforms existing advanced techniques in both visual quality and quantitative assessment. Moreover, it achieves superior detection accuracy compared to current methods, reaching an advanced level of performance. The source code will be released at https://github.com/VCMHE/CNLFusion.
期刊介绍:
The main focus for the IEEE Transactions on Consumer Electronics is the engineering and research aspects of the theory, design, construction, manufacture or end use of mass market electronics, systems, software and services for consumers.