Shuying Huang , Kai Zhang , Yong Yang , Weiguo Wan
{"title":"PCFFusion: Progressive cross-modal feature fusion network for infrared and visible images","authors":"Shuying Huang , Kai Zhang , Yong Yang , Weiguo Wan","doi":"10.1016/j.patcog.2025.112419","DOIUrl":null,"url":null,"abstract":"<div><div>Infrared and visible image fusion (IVIF) aims to fuse thermal target information in infrared images and spatial texture information in visible images, improving the observability and comprehensibility of the fused images. Currently, most IVIF methods suffer from the loss of salient target information and texture details in fused images. To alleviate this problem, a progressive cross-modal feature fusion network (PCFFusion) for IVIF is proposed, which comprises two stages: feature extraction and feature fusion. In the feature extraction stage, to enhance the network’s feature representation capability, a feature decomposition module (FDM) is constructed to extract two modal features of different scales by defining a feature decomposition operation (FDO). In addition, by establishing correlations between the high- frequency and low-frequency components of two modal features, a cross-modal feature enhancement module (CMFEM) is built to realize correction and enhancement of the two features at each scale. The feature fusion stage achieves the fusion of two modal features at each scale and the supplementation of adjacent scale features by constructing three cross-domain fusion module (CDFMs). To constrain the fused results preserve more salient targets and richer texture details, a dual-feature fidelity loss function is defined by constructing a salient weight map to balance the two loss terms. Extensive experiments demonstrate that fusion results of the proposed method highlight prominent targets from infrared images while retaining rich background details from visible images, and the performance of PCFFusion is superior to some advanced methods. Specifically, compared to the optimal results obtained by other comparison methods, the proposed network achieves an average increase of 30.35 % and 10.9 % in metrics Mutual Information (MI) and Standard deviation (SD) on the TNO dataset, respectively.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112419"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325010805","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Infrared and visible image fusion (IVIF) aims to fuse thermal target information in infrared images and spatial texture information in visible images, improving the observability and comprehensibility of the fused images. Currently, most IVIF methods suffer from the loss of salient target information and texture details in fused images. To alleviate this problem, a progressive cross-modal feature fusion network (PCFFusion) for IVIF is proposed, which comprises two stages: feature extraction and feature fusion. In the feature extraction stage, to enhance the network’s feature representation capability, a feature decomposition module (FDM) is constructed to extract two modal features of different scales by defining a feature decomposition operation (FDO). In addition, by establishing correlations between the high- frequency and low-frequency components of two modal features, a cross-modal feature enhancement module (CMFEM) is built to realize correction and enhancement of the two features at each scale. The feature fusion stage achieves the fusion of two modal features at each scale and the supplementation of adjacent scale features by constructing three cross-domain fusion module (CDFMs). To constrain the fused results preserve more salient targets and richer texture details, a dual-feature fidelity loss function is defined by constructing a salient weight map to balance the two loss terms. Extensive experiments demonstrate that fusion results of the proposed method highlight prominent targets from infrared images while retaining rich background details from visible images, and the performance of PCFFusion is superior to some advanced methods. Specifically, compared to the optimal results obtained by other comparison methods, the proposed network achieves an average increase of 30.35 % and 10.9 % in metrics Mutual Information (MI) and Standard deviation (SD) on the TNO dataset, respectively.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.